Who feeds the world ? And how wealthy are they ?

There are folium maps in this notebook. If they do not display well, please have a look at the .html file from the same repository.

Abstract

Our main goal in this project is to find social and economic world-wide relations of countries based on the state of their agricultural sector, using indicators such as imports, exports, production, self-sufficiency, etc. In order to find such indicators, we would like to use the data from the "Global Food & Agriculture Statistics" datasets. First, we would like to produce a map showing which countries "feed the world" i.e. which countries are net-exporting food. That map would contain a slider to show how this evolved over the past fifty years. Then we would like to show countrywise the level of food self-sufficiency i.e. the way a country does not need to trade with other ones in order to feed its population. We will also compare it to nations' economic development and see if some correlations can be found. We will visualize our best findings with several interactive maps and plots.

Research questions

We would like to work on the following research questions:

  • How does the production and consumption of food look like from a geographical point of view ? Which countries are net food exporters or importers? How did this evolve over the last few decades ?
  • What's the level of self-sufficiency in food production of individual countries and how does this change over time ?
  • Is there a link between the GDP per capita and the agricultural trade balance ? Are countries that are net exporters or importers richer ? Are self-sufficient countries richer ?
  • If we find any relations, are they also still valid if we check for specific crops ? Are there some crops that are mostly produced by richer countries, some that are mostly produced by poorer countries?

External imports:

In [1]:
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import folium
import seaborn as sns
import json
import re
import requests
from bs4 import BeautifulSoup
from ipywidgets import interact
from IPython.display import display
import scipy.cluster.hierarchy as spc
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.linear_model import Ridge
from operator import itemgetter
from sklearn import preprocessing
from tqdm import tqdm

Auxiliary function imports:

We have implemented some functions into a dedicated module (file Milestone_2_scripts.py) in order to simplify the code and make this notebook more enjoyable to read.

In [2]:
#from Milestone_2_scripts import *

Setup:

In [3]:
data_folder_path = "./Data/current_FAO/raw_files/"

files = {"Crops production" : "Production_Crops_E_All_Data_(Normalized).csv",
         "Food trade" : "Trade_Crops_Livestock_E_All_Data_(Normalized).csv", 
         "Consumer price indices" : "ConsumerPriceIndices_E_All_Data_(Normalized).csv",
         "Macroeconomy" : "Macro-Statistics_Key_Indicators_E_All_Data_(Normalized).csv",
         "Livestock production" : "Production_Livestock_E_All_Data_(Normalized).csv",
         "Live animals trade" : "Trade_LiveAnimals_E_All_Data_(Normalized).csv"
        }
interesting_datasets = files.keys()

1.A. Dataset description

Our main dataset would be a subset of the "Global Food & Agriculture Statistics" that is found in the proposed datasets list. In this dataset, we have seen that we could work with the production as well as import and export quantities per year and per country. We will add information about countries GDP to this database.

1.B. Loading the data set

In [4]:
def load_datasets(datasets) :
    df = {}
    for dataset in datasets :
        file_path = data_folder_path + files[dataset]
        df[dataset] = pd.read_csv(file_path, encoding = "ISO-8859-1")
    return df

We load each interresting dataset in the dictionary df :

In [5]:
df = load_datasets(interesting_datasets)

1.C. Understanding the data set

In this part, we will have a first look of the datasets in order to get a first sense of the data.

In [6]:
def display_df(df, datasets):
    for dataset in datasets :
        display(dataset, df[dataset].sample(5))

In order to see what does the datasets look like, we display a sample of 5 rows for each of them :

In [7]:
display_df(df, interesting_datasets)
'Crops production'
Area Code Area Item Code Item Element Code Element Year Code Year Unit Value Flag
2471712 5801 Least Developed Countries 89 Buckwheat 5510 Production 1996 1996 tonnes 5000.0 A
2532267 5803 Small Island Developing States 536 Plums and sloes 5419 Yield 1999 1999 hg/ha 36667.0 Fc
1479535 195 Senegal 56 Maize 5510 Production 1973 1973 tonnes 33800.0 NaN
1660503 176 Timor-Leste 56 Maize 5419 Yield 2005 2005 hg/ha 15900.0 Fc
558255 59 Egypt 497 Lemons and limes 5419 Yield 1994 1994 hg/ha 201572.0 Fc
'Food trade'
Area Code Area Item Code Item Element Code Element Year Code Year Unit Value Flag
5018195 101 Indonesia 266 Oil, castor beans 5910 Export Quantity 1967 1967 tonnes 0.0 NaN
5260410 105 Israel 577 Dates 5910 Export Quantity 1972 1972 tonnes 196.0 NaN
14390992 5815 Low Income Food Deficit Countries 89 Buckwheat 5610 Import Quantity 1967 1967 tonnes 0.0 A
12762797 5207 South America 1896 Tobacco 5622 Import Value 1989 1989 1000 US$ 44249.0 A
11393621 251 Zambia 620 Fruit, dried nes 5910 Export Quantity 2007 2007 tonnes 0.0 F
'Consumer price indices'
Area Code Area Item Code Item Months Code Months Year Code Year Unit Value Flag Note
23639 97 Hungary 23013 Consumer Prices, Food Indices (2010 = 100) 7010 October 2012 2012 NaN 143.200000 X 2006
50567 277 South Sudan 23013 Consumer Prices, Food Indices (2010 = 100) 7004 April 2009 2009 NaN 67.996000 X 2011M6
12319 46 Congo 23012 Consumer Prices, General Indices (2010 = 100) 7002 February 2001 2001 NaN 71.007209 X 2010
35772 142 Montserrat 23013 Consumer Prices, Food Indices (2010 = 100) 7007 July 2012 2012 NaN 284.190000 X 1984M4
8934 115 Cambodia 23013 Consumer Prices, Food Indices (2010 = 100) 7002 February 2011 2011 NaN 154.690000 NaN 2006M10-2006M12
'Macroeconomy'
Area Code Area Item Code Item Element Code Element Year Code Year Unit Value Flag
347855 159 Nigeria 22077 Value Added (Manufacture of food and beverages) 6114 Value Local Currency, 2005 prices 2009 2009 millions NaN NaN
516312 236 Venezuela (Bolivarian Republic of) 22011 Gross National Income 6110 Value US$ 2012 2012 millions 371207.423120 XAM
458692 211 Switzerland 22076 Value Added (Manufacture of food, beverages an... 6110 Value US$ 1983 1983 millions NaN NaN
350507 162 Norway 22078 Value Added (Manufacture of tobacco products) 6109 Value Local Currency 1977 1977 millions NaN NaN
32673 12 Bahamas 22015 Gross Fixed Capital Formation 6156 Annual growth Local Currency, 2005 prices 1976 1976 % 6.328285 Fc
'Livestock production'
Area Code Area Item Code Item Element Code Element Year Code Year Unit Value Flag
3644 8 Antigua and Barbuda 1746 Cattle and Buffaloes 5111 Stocks 1986 1986 Head 18000.0 A
50325 89 Guatemala 866 Cattle 5111 Stocks 2009 2009 Head 3215140.0 NaN
44398 74 Gabon 1034 Pigs 5111 Stocks 1996 1996 Head 180000.0 NaN
48073 85 Greenland 1096 Horses 5111 Stocks 1978 1978 Head NaN M
110788 210 Sweden 976 Sheep 5111 Stocks 1962 1962 Head 169450.0 NaN
'Live animals trade'
Area Code Area Item Code Item Element Code Element Year Code Year Unit Value Flag
614966 5403 Southern Europe 1057 Chickens 5922 Export Value 1981 1981 1000 US$ 13097.0 A
474501 226 Uganda 866 Cattle 5622 Import Value 1996 1996 1000 US$ 319.0 NaN
158034 60 El Salvador 866 Cattle 5622 Import Value 2004 2004 1000 US$ 10292.0 NaN
320441 150 Netherlands 976 Sheep 5608 Import Quantity 1993 1993 Head 56789.0 NaN
127197 98 Croatia 1034 Pigs 5922 Export Value 2007 2007 1000 US$ 179.0 NaN

At first glance, our datasets seem very clean.

Each of our dataset contains a column "Year" and a column that is either named "Area" or "Country". This is a great news for us since we want to do a both geographical and time-related analysis.

The columns "Area" and "Country" both correspond to the country except that the "Area" may contains a group of country (e.g. "Eastern Europe").

1.D. Cleansing the data set

In this part, we will clean the datasets. The final goal is to produce one uniformized and normalized dataset on which we could work (see 1.F).

Such a cleaned dataset may look like this (in a very simplistic way):

Country | Year | GDP | Crops production | Livestock production

1.D.a. Removing unuseful data

In this section, we will create dataframes in df_useful which correspond to previous dataframes without the unuseful data.

In [8]:
df_useful = {}
1.D.a.i. Extracting GDP from the "Macroeconomy" dataset

The "Macroeconomy" dataset contains many different measures: Gross Fixed Capital Formation, Gross National Income, Value Added (Total Manufacturing), ... We are only interested in Gross Domestic Product. Therefore, we extract it Gross Domestic Product from the "Macroeconomy" dataset. In order to have uniformisation among values, we choose the US$ value. All of them have the same unit (millions US$) so we can drop the "Unit" column as well.

In [9]:
def extract_GDP(df):
    def selection_GDP(df):
        return df['Item']=='Gross Domestic Product'
    def selection_US_dollars(df):
        return df['Element']=="Value US$"
    def drop_columns(df):
        dropped_colmuns = ["Item Code", "Item", "Element Code", "Element", "Flag", "Year Code", "Unit"]
        return df.drop(columns = dropped_colmuns)
    return drop_columns(df[selection_GDP(df)&selection_US_dollars(df)])
In [10]:
df_useful["GDP"] = extract_GDP(df["Macroeconomy"])

We can have have a look at a sample of the extrated dataset:

In [11]:
display(df_useful["GDP"].sample(5))
Area Code Area Year Value
306155 140 Monaco 1998 2.881551e+03
550693 5305 Western Asia 2006 1.709672e+06
358446 180 Palau 1980 2.566370e+01
408315 193 Sao Tome and Principe 1991 1.022105e+02
499906 215 United Republic of Tanzania 1977 6.331677e+03

And we can plot GDP in million US$ for different countries for the period 1970-2015:

In [12]:
select_switzerland = df_useful["GDP"]['Area']=='Switzerland'
select_france = df_useful["GDP"]['Area']=='France'
select_austria = df_useful["GDP"]['Area']=='Austria'
select_canada = df_useful["GDP"]['Area']=='Canada'
ax = df_useful["GDP"][select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = df_useful["GDP"][select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful["GDP"][select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful["GDP"][select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
Out[12]:
<matplotlib.legend.Legend at 0x1bcaf710c08>

For dissolute or new countries, we have some Nan values (before appearing or after dissolution) as in this next example :

In [13]:
select_USSR = df_useful["GDP"]['Area']=='USSR'
select_russia = df_useful["GDP"]['Area']=='Russian Federation'
select_ukraine = df_useful["GDP"]['Area']=='Ukraine'
ax = df_useful["GDP"][select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = df_useful["GDP"][select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful["GDP"][select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
Out[13]:
<matplotlib.legend.Legend at 0x1bcaf77aa48>
1.D.a.ii. Extracting crops harvested area, production, seed and yield from the "Crops production" dataset

We want to extract crops harvested area, production, seed and yield from the "Crops production" dataset. As all crops are not food crops, we request the World crops database to keep only the food crops.

In [14]:
def get_food_crops():
    #Return a list of crops categorized as food crops https://world-crops.com/food-crops/
    url="https://world-crops.com/food-crops/"
    r=requests.get(url,headers={"User-Agent": "XY"})
    soup=BeautifulSoup(r.text,'html.parser')
    elements_temp=soup.find_all('a',href=re.compile("^../"))
    elements=[el.text for el in elements_temp]
    
    #only 40 elements are displayed on each page->iterating on the total list
    for i in range(40,401,40):
        url_i=url+"?ss="+str(i)
        r=requests.get(url_i,headers={"User-Agent":"XY"})
        soup=BeautifulSoup(r.text,'html.parser')
        new_elements=soup.find_all('a',href=re.compile("^../"))
        elements+=[el.text for el in new_elements]
    return elements

def inclusive_search(string,elements):
    #returns true if the string can be found in elements. The search removes special characters from string in order to include more positive results
    string=string.lower()
    delimiters = ",", "(","&",")"," and "," "
    pattern = '|'.join(map(re.escape, delimiters))
    strings=list(filter(None,re.split(pattern,string)))
    found=False
    for s in strings:
        if s=="nes":
            continue
        for el in elements:
            found=(s in el.split())
            if found==False and s[-1]=="s":
                found=s[:-1] in el.split()
            if found==False and s[-2:]=="es":
                found=s[:-2] in el.split()
            if found==False and s[-3:]=="ies":
                found=s[:-3]+"y" in el.split()
            if found==True:
                return found
    return found


def get_food_crop_data(df):    
    #extracts the food crop data, returns 4 df: Area,Production,Seed and yield    
    df=df.copy()
    food_crops=list(map(lambda x: x.lower(),get_food_crops()))              
    crop_types_df=df[['Item','Value']].groupby('Item').sum()
    crop_types_df=crop_types_df[list(map(lambda x : inclusive_search(x,food_crops) , crop_types_df.index ))]   
    food_crop_df=df[df.Item.apply(lambda x: x in crop_types_df.index)]
    return (food_crop_df[food_crop_df.Element=='Area harvested'],
            food_crop_df[food_crop_df.Element=='Production'],
            food_crop_df[food_crop_df.Element=='Seed'],
            food_crop_df[food_crop_df.Element=='Yield'])
  
food_crop_area_df , food_crop_production_df , food_crop_seed_df , food_crop_yield_df = get_food_crop_data(df["Crops production"])
In [15]:
df_useful['Crops Area harvested'] = food_crop_area_df.drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Crops Production'] = food_crop_production_df.drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Crops Seed'] = food_crop_seed_df.drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Crops Yield'] =  food_crop_yield_df.drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])

We check everything is fine by looking at samples for each of the new dataframes:

In [16]:
display(df_useful['Crops Area harvested'].sample(5))
display(df_useful['Crops Production'].sample(5))
display(df_useful['Crops Seed'].sample(5))
display(df_useful['Crops Yield'].sample(5))
Area Code Area Item Year Unit Value
239460 233 Burkina Faso Groundnuts, with shell 1967 ha 142000.0
650881 75 Gambia Fruit, fresh nes 1979 ha NaN
1273423 165 Pakistan Eggplants (aubergines) 1998 ha 8190.0
189234 19 Bolivia (Plurinational State of) Roots and Tubers,Total 1965 ha 143045.0
1782981 231 United States of America Sugar beet 2014 ha 463900.0
Area Code Area Item Year Unit Value
838350 102 Iran (Islamic Republic of) Fruit excl Melons,Total 1998 tonnes 11649383.0
2134669 5207 South America Kiwi fruit 1992 tonnes 85000.0
2590771 5817 Net Food Importing Developing Countries Sugar beet 1976 tonnes 2808357.0
2093460 5204 Central America Lemons and limes 1994 tonnes 1001782.0
2215442 5303 Southern Asia Buckwheat 1969 tonnes 3800.0
Area Code Area Item Year Unit Value
716212 88 Guam Maize 2009 tonnes 0.0
2122478 5206 Caribbean Cereals (Rice Milled Eqv) 1965 tonnes 28032.0
2296236 5400 Europe Rye 1981 tonnes 2630157.0
636953 69 French Guiana Cereals (Rice Milled Eqv) 1987 tonnes 236.0
555985 59 Egypt Cow peas, dry 2014 tonnes 135.0
Area Code Area Item Year Unit Value
108403 12 Bahamas Citrus Fruit,Total 1997 hg/ha 207086.0
595975 62 Ethiopia PDR Millet 1972 hg/ha 5163.0
1187783 150 Netherlands Grapes 1971 hg/ha 279739.0
99865 52 Azerbaijan Currants 2009 hg/ha 35000.0
387895 44 Colombia Cabbages and other brassicas 1976 hg/ha 381809.0

We also make some plots to have a first understanding of the dataset:

In [17]:
select_Maize = df_useful['Crops Area harvested']['Item']=='Maize'
maize_df = df_useful['Crops Area harvested'][select_Maize]

select_switzerland = maize_df['Area']=='Switzerland'
select_france = maize_df['Area']=='France'
select_austria = maize_df['Area']=='Austria'
select_canada = maize_df['Area']=='Canada'
ax = maize_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
Out[17]:
<matplotlib.legend.Legend at 0x1bcb1214448>
In [18]:
select_USSR = maize_df['Area']=='USSR'
select_russia = maize_df['Area']=='Russian Federation'
select_ukraine = maize_df['Area']=='Ukraine'
ax = maize_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
Out[18]:
<matplotlib.legend.Legend at 0x1bcb1401fc8>
1.D.a.iii. Extracting stocks production from the "Livestock production" dataset

We want to extract stocks production from the "Livestock production" dataset. Again, we drop the columns that are useless for us and have a first look of the data with a sample and some plots.

In [19]:
selection_stocks = df['Livestock production']["Element"] == 'Stocks'
df_useful['Livestock production'] = df['Livestock production'][selection_stocks].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
In [20]:
display(df_useful['Livestock production'].sample(5))
Area Code Area Item Year Unit Value
56414 101 Indonesia Cattle 1961 Head 6356000.0
17975 29 Burundi Beehives 1972 No 85000.0
114518 176 Timor-Leste Goats 1987 Head 79053.0
32736 51 Czechoslovakia Turkeys 1989 1000 Head 1140.0
72197 134 Malta Chickens 1963 1000 Head 520.0
In [21]:
select_pigs = df_useful['Livestock production']['Item']=='Pigs'
pigs_df = df_useful['Livestock production'][select_pigs]

select_switzerland = pigs_df['Area']=='Switzerland'
select_france = pigs_df['Area']=='France'
select_austria = pigs_df['Area']=='Austria'
select_canada = pigs_df['Area']=='Canada'
ax = pigs_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
Out[21]:
<matplotlib.legend.Legend at 0x1bcb0d54048>
In [22]:
select_USSR = pigs_df['Area']=='USSR'
select_russia = pigs_df['Area']=='Russian Federation'
select_ukraine = pigs_df['Area']=='Ukraine'
ax = pigs_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
Out[22]:
<matplotlib.legend.Legend at 0x1bcaf86f8c8>
1.D.a.iv. Extracting import and export quantities from the "Live animals trade" and "Crops trade" datasets

Now, we extract import and export quantities from the "Live animals trade" and "Crops trade" datasets, having again some samples and some plots.

In [23]:
selection_import_quantities = df['Live animals trade']["Element"] == 'Import Quantity'
selection_export_quantities = df['Live animals trade']["Element"] == 'Export Quantity'

df_useful['Live animals import quantities'] = df['Live animals trade'][selection_import_quantities].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Live animals export quantities'] = df['Live animals trade'][selection_export_quantities].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
In [24]:
display(df_useful['Live animals import quantities'].sample(5))
Area Code Area Item Year Unit Value
410410 197 Sierra Leone Cattle 1990 Head 25000.0
336713 159 Nigeria Bovine, Animals 2010 Head 450000.0
569777 5206 Caribbean Goats 1975 Head 1611.0
16066 9 Argentina Sheep and Goats 1968 Head 151.0
63421 239 British Virgin Islands Horses 1995 Head 0.0
In [25]:
select_pigs = df_useful['Live animals import quantities']['Item']=='Pigs'
pigs_df = df_useful['Live animals import quantities'][select_pigs]

select_switzerland = pigs_df['Area']=='Switzerland'
select_france = pigs_df['Area']=='France'
select_austria = pigs_df['Area']=='Austria'
select_canada = pigs_df['Area']=='Canada'
ax = pigs_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
Out[25]:
<matplotlib.legend.Legend at 0x1bcc0511c08>
In [26]:
select_USSR = pigs_df['Area']=='USSR'
select_russia = pigs_df['Area']=='Russian Federation'
select_ukraine = pigs_df['Area']=='Ukraine'
ax = pigs_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
Out[26]:
<matplotlib.legend.Legend at 0x1bcb0f30d48>
In [27]:
display(df_useful['Live animals export quantities'].sample(5))
Area Code Area Item Year Unit Value
608999 5402 Northern Europe Pigeons, other birds 2003 1000 Head 1.0
151904 58 Ecuador Rabbits and hares 1969 1000 Head NaN
206555 91 Guyana Goats 1961 Head NaN
58110 20 Botswana Horses 1984 Head 6.0
369893 174 Portugal Ducks 1993 1000 Head NaN
In [28]:
select_pigs = df_useful['Live animals export quantities']['Item']=='Pigs'
pigs_df = df_useful['Live animals export quantities'][select_pigs]

select_switzerland = pigs_df['Area']=='Switzerland'
select_france = pigs_df['Area']=='France'
select_austria = pigs_df['Area']=='Austria'
select_canada = pigs_df['Area']=='Canada'
ax = pigs_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
Out[28]:
<matplotlib.legend.Legend at 0x1bcb091d808>
In [29]:
select_USSR = pigs_df['Area']=='USSR'
select_russia = pigs_df['Area']=='Russian Federation'
select_ukraine = pigs_df['Area']=='Ukraine'
ax = pigs_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
Out[29]:
<matplotlib.legend.Legend at 0x1bcb01da408>
In [30]:
selection_import_quantities = df['Food trade']["Element"] == 'Import Quantity'
selection_export_quantities = df['Food trade']["Element"] == 'Export Quantity'

df_useful['Food import quantities'] = df['Food trade'][selection_import_quantities].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Food export quantities'] = df['Food trade'][selection_export_quantities].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
In [31]:
display(df_useful['Food import quantities'].sample(5))
Area Code Area Item Year Unit Value
2131007 40 Chile Tobacco 1980 tonnes 3736.0
11770888 5000 World Hair, goat, coarse 1974 tonnes 25.0
1028816 23 Belize Soya sauce 1988 tonnes 0.0
7509061 159 Nigeria Silk raw 1999 tonnes 25.0
6931094 28 Myanmar Butter 1984 tonnes 0.0
In [32]:
select_Maize = df_useful['Food import quantities']['Item']=='Maize'
maize_df = df_useful['Food import quantities'][select_Maize]

select_switzerland = maize_df['Area']=='Switzerland'
select_france = maize_df['Area']=='France'
select_austria = maize_df['Area']=='Austria'
select_canada = maize_df['Area']=='Canada'
ax = maize_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
Out[32]:
<matplotlib.legend.Legend at 0x1bcb0a04408>
In [33]:
select_USSR = maize_df['Area']=='USSR'
select_russia = maize_df['Area']=='Russian Federation'
select_ukraine = maize_df['Area']=='Ukraine'
ax = maize_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
Out[33]:
<matplotlib.legend.Legend at 0x1bcb0d08648>
In [34]:
display(df_useful['Food export quantities'].sample(5))
Area Code Area Item Year Unit Value
13951988 5503 Micronesia Tobacco products nes 1963 tonnes 0.0
3169474 54 Denmark Meat, beef, preparations 1989 tonnes 10909.0
13263643 5400 Europe Fat, liver prepared (foie gras) 1985 tonnes 986.0
9799241 209 Swaziland Tallow 2011 tonnes 0.0
10580451 226 Uganda Groundnuts, shelled 1977 tonnes NaN
In [35]:
select_Maize = df_useful['Food export quantities']['Item']=='Maize'
maize_df = df_useful['Food export quantities'][select_Maize]

select_switzerland = maize_df['Area']=='Switzerland'
select_france = maize_df['Area']=='France'
select_austria = maize_df['Area']=='Austria'
select_canada = maize_df['Area']=='Canada'
ax = maize_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
Out[35]:
<matplotlib.legend.Legend at 0x1bcb0bca888>
In [36]:
select_USSR = maize_df['Area']=='USSR'
select_russia = maize_df['Area']=='Russian Federation'
select_ukraine = maize_df['Area']=='Ukraine'
ax = maize_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
Out[36]:
<matplotlib.legend.Legend at 0x1bcb0c146c8>
1.D.a.v. Extracting average CPI of each year from the "Consumer price indices" dataset

The "Consumer price indices" dataset contains monthly data. In order to have a uniform dataframe, and as other dataframes have yearly data, we group it by (Country, Year) and compute the mean over the monthes.

In [37]:
df_useful['Consumer price indices'] =  df['Consumer price indices'][['Area',"Year",'Value']] \
                                        .dropna() \
                                        .groupby(['Area',"Year"]) \
                                        .mean() \
                                        .reset_index() \
                                        .dropna()

With samples and plots, we remark that this dataset only start in 2000 wheareas other ones start in 1970.

In [38]:
display(df_useful['Consumer price indices'].sample(5))
Area Year Value
1181 Germany 2012 105.196005
102 Antigua and Barbuda 2013 108.083497
3183 United Arab Emirates 2015 105.291384
1552 Jordan 2003 69.642768
2824 South Sudan 2007 67.573301
In [39]:
select_switzerland = df_useful['Consumer price indices']['Area']=='Switzerland'
select_france = df_useful['Consumer price indices']['Area']=='France'
select_austria = df_useful['Consumer price indices']['Area']=='Austria'
select_canada = df_useful['Consumer price indices']['Area']=='Canada'
ax = df_useful['Consumer price indices'][select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = df_useful['Consumer price indices'][select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful['Consumer price indices'][select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful['Consumer price indices'][select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
Out[39]:
<matplotlib.legend.Legend at 0x1bcb0af7408>
In [40]:
select_russia = df_useful["Consumer price indices"]['Area']=='Russian Federation'
select_ukraine = df_useful["Consumer price indices"]['Area']=='Ukraine'
ax = df_useful["Consumer price indices"][select_russia].plot(x ='Year', y='Value', kind = 'line')
ax = df_useful["Consumer price indices"][select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(['Russia', 'Ukraine'])
Out[40]:
<matplotlib.legend.Legend at 0x1bcb0ae18c8>
1.D.a.vi. Removing areas which are not countries

Having a more detailled look at the dataset, we have remarked that the areas which are real countries are exactely the ones with an "Area Code" below $5000$ but not in $[261, 269]$.

In [41]:
#remove Area code >= 5000 or in [261, 269] (EU)
for df_name in df_useful :
    if 'Area Code' in df_useful[df_name].keys() : 
        print ("Removing areas which are not countries in", df_name)
        selection_countries = df_useful[df_name]['Area Code']<261 
        selection_countries = selection_countries | df_useful[df_name]['Area Code']>269
        selection_countries = selection_countries & df_useful[df_name]['Area Code']<5000
        df_useful[df_name] = df_useful[df_name][selection_countries]
        display(df_useful[df_name].sample(5))
    else :
        print (df_name, "is already clean")
Removing areas which are not countries in GDP
Area Code Area Year Value
263508 122 Lesotho 2013 2218.112442
61266 20 Botswana 1976 296.963130
130350 167 Czechia 2006 155213.120560
391135 185 Russian Federation 1979 NaN
134059 116 Democratic People's Republic of Korea 1978 9116.107415
Removing areas which are not countries in Crops Area harvested
Area Code Area Item Year Unit Value
860686 105 Israel Beans, green 2011 ha 950.0
2197 2 Afghanistan Maize 2012 ha 141000.0
1789729 234 Uruguay Fruit, fresh nes 1969 ha NaN
956114 118 Kuwait Eggplants (aubergines) 1980 ha 9.0
303596 39 Chad Maize 1974 ha 5600.0
Removing areas which are not countries in Crops Production
Area Code Area Item Year Unit Value
2458097 5706 European Union Peaches and nectarines 1967 tonnes 2194704.0
237654 27 Bulgaria Vegetables&Melons, Total 1965 tonnes 1658513.0
579527 61 Equatorial Guinea Coffee, green 2009 tonnes 4000.0
1088813 137 Mauritius Coconuts 2001 tonnes 1500.0
902168 109 Jamaica Roots and Tubers,Total 2014 tonnes 235063.0
Removing areas which are not countries in Crops Seed
Area Code Area Item Year Unit Value
1517959 25 Solomon Islands Rice, paddy 1966 tonnes 13.0
1083248 136 Mauritania Maize 1994 tonnes 149.0
2478729 5801 Least Developed Countries Lentils 1973 tonnes 12783.0
1560327 203 Spain Wheat 1976 tonnes 395000.0
861137 105 Israel Broad beans, horse beans, dry 2012 tonnes 832.0
Removing areas which are not countries in Crops Yield
Area Code Area Item Year Unit Value
2452076 5706 European Union Fruit, tropical fresh nes 1971 hg/ha 153875.0
1915161 5100 Africa Apricots 2009 hg/ha 79342.0
203740 21 Brazil Grapes 2002 hg/ha 173250.0
2268792 5305 Western Asia Oranges 2012 hg/ha 194987.0
756059 93 Haiti Cabbages and other brassicas 1999 hg/ha 60502.0
Removing areas which are not countries in Livestock production
Area Code Area Item Year Unit Value
56872 101 Indonesia Poultry Birds 1987 1000 Head 452581.0
14845 21 Brazil Chickens 2013 1000 Head 1246638.0
51425 175 Guinea-Bissau Asses 1975 Head 2000.0
35964 56 Dominican Republic Poultry Birds 2014 1000 Head 168000.0
145901 5305 Western Asia Camels 1990 Head 957545.0
Removing areas which are not countries in Live animals import quantities
Area Code Area Item Year Unit Value
52953 19 Bolivia (Plurinational State of) Asses 1973 Head 0.0
585756 5302 Eastern Asia Sheep 1980 Head 1485.0
317881 150 Netherlands Asses 1977 Head 0.0
302343 141 Mongolia Sheep and Goats 1965 Head 0.0
361916 171 Philippines Beehives 1966 No NaN
Removing areas which are not countries in Live animals export quantities
Area Code Area Item Year Unit Value
506376 249 Yemen Horses 1961 Head 0.0
593086 5304 South-Eastern Asia Ducks 1996 1000 Head 9006.0
378167 117 Republic of Korea Sheep 1999 Head NaN
406474 195 Senegal Turkeys 1997 1000 Head NaN
551116 5105 Western Africa Animals live nes 1970 Head 0.0
Removing areas which are not countries in Food import quantities
Area Code Area Item Year Unit Value
3945052 70 French Polynesia Buckwheat 2010 tonnes 0.0
13423996 5402 Northern Europe Beeswax 2001 tonnes 755.0
6286540 131 Malaysia Cotton lint 1999 tonnes 83603.0
3042932 167 Czechia Wafers 1993 tonnes NaN
7133768 150 Netherlands Manila fibre (abaca) 1986 tonnes 203.0
Removing areas which are not countries in Food export quantities
Area Code Area Item Year Unit Value
2106432 40 Chile Rice - total (Rice milled equivalent) 1997 tonnes 5.0
5522911 110 Japan Plums and sloes 1964 tonnes 0.0
7181985 150 Netherlands Wine+Vermouth+Sim. 1973 tonnes 7676.0
5375210 106 Italy Pineapples canned 1974 tonnes 148.0
12488913 5203 Northern America Sweet corn prep or preserved 2009 tonnes 123018.0
Consumer price indices is already clean

1.D.b. Handling of the missing data

In this section, we will explain how we will handle the missing data in previous dataframes for maps.

1.D.b.i. Highlighting the problem
In [42]:
select_USSR = df_useful["GDP"]['Area']=='USSR'
select_russia = df_useful["GDP"]['Area']=='Russian Federation'
select_ukraine = df_useful["GDP"]['Area']=='Ukraine'
ax = df_useful["GDP"][select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = df_useful["GDP"][select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful["GDP"][select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
Out[42]:
<matplotlib.legend.Legend at 0x1bcafffb088>

In order to vizualize folium maps, we need to associate each country a value. The geojson file that we use is not timestamped and only countries that exist nowadays are inside it. As some countries has been dissolved during the past 50 years, our folium maps won't be complete. For instance, we do not have any value for Ukraine from 1970 to 1989. Our idea to fix this issue is presented in the next paragraph.

1.D.b.ii. Proposed correction

Our idea is to map the former country value to each of the current ones. For instance in 1982, USSR GDP is around one trillion $. Therefore, if we associate (only for folium map purposes) this value to each current country that succeeded USSR, all these countries will appear the same color in the folium map, i.e. all the USSR area will appear the same color (and the good one).

In order to do so, one need to identify which countries appeared and disappeared from the dataset and at which year. Then we will use this result along with some historical research in our visualise_world_data_folium function (1Ea).

In [43]:
countries_formation_years = {}
for country in df_useful["GDP"]["Area"].unique():
    selection = df_useful["GDP"]["Area"] == country
    year_in, year_out = df_useful["GDP"][selection].dropna()["Year"].min(), df_useful["GDP"][selection].dropna()["Year"].max()
    for year in (year_in, year_out):
        if year not in countries_formation_years :
            countries_formation_years[year] = []
    countries_formation_years[year_in].append((country,'+'))
    countries_formation_years[year_out].append((country,'-'))

countries_formation_years.pop(1970)
countries_formation_years.pop(2015)
for year in sorted(list(countries_formation_years)):
    print (year, countries_formation_years[year])
1988 [('Yemen Ar Rp', '-'), ('Yemen Dem', '-')]
1989 [('Czechoslovakia', '-'), ('Ethiopia PDR', '-'), ('USSR', '-'), ('Yemen', '+'), ('Yugoslav SFR', '-')]
1990 [('Armenia', '+'), ('Azerbaijan', '+'), ('Belarus', '+'), ('Bosnia and Herzegovina', '+'), ('Croatia', '+'), ('Czechia', '+'), ('Eritrea', '+'), ('Estonia', '+'), ('Ethiopia', '+'), ('Georgia', '+'), ('Kazakhstan', '+'), ('Kyrgyzstan', '+'), ('Latvia', '+'), ('Lithuania', '+'), ('Montenegro', '+'), ('Republic of Moldova', '+'), ('Russian Federation', '+'), ('Serbia', '+'), ('Slovakia', '+'), ('Slovenia', '+'), ('Tajikistan', '+'), ('The former Yugoslav Republic of Macedonia', '+'), ('Timor-Leste', '+'), ('Turkmenistan', '+'), ('Ukraine', '+'), ('Uzbekistan', '+'), ('Central Asia', '+')]
1999 [('Kosovo', '+')]
2005 [('Curaçao', '+'), ('Sint Maarten (Dutch Part)', '+')]
2007 [('Sudan (former)', '-')]
2008 [('South Sudan', '+'), ('Sudan', '+')]
2012 [('Netherlands Antilles (former)', '-')]

1.E. Preprocessing the data set

In this part, we will finish prepocessing the datasets. More precisely, we will deal with country names and normalizing the features.

1.E.a. Converting country names between different naming conventions

Some countries have different names in the geojson file and in the dataset. We first start by correcting them.

In [44]:
dic = {'Czechia': "Czech Republic",
       'Russian Federation':'Russia',
       "Serbia":"Republic of Serbia",
       'The former Yugoslav Republic of Macedonia':'Macedonia',
       'China, mainland':'China',
       'Viet Nam':'Vietnam',
       'Venezuela (Bolivarian Republic of)':'Venezuela',
       'Iran (Islamic Republic of)':'Iran',
       'Syrian Arab Republic':"Syria",
       'Bolivia (Plurinational State of)': 'Bolivia',
       "Côte d'Ivoire": "Ivory Coast",
       'Congo':"Republic of the Congo",
       "Lao People's Democratic Republic":'Laos',
       "Democratic People's Republic of Korea":"North Korea",
       'Republic of Korea':"South Korea"}

def correct_country_names(old_name):
    if old_name in dic.keys() :
        return dic[old_name]
    return old_name
In [45]:
for df_name in df_useful :
    print (df_name)
    df_useful[df_name]["Area"] = df_useful[df_name]["Area"].apply(correct_country_names)
GDP
Crops Area harvested
Crops Production
Crops Seed
Crops Yield
Livestock production
Live animals import quantities
Live animals export quantities
Food import quantities
Food export quantities
Consumer price indices

Then, we do a function that takes as input a dataframe and a year and produces the corresponding folium map. This function also handles dissolutions of countries as suggested before.

In [46]:
def visualise_world_data_folium(df, year, logScale=True):
    dic = {'USSR':                            ['Armenia', 'Azerbaijan','Belarus', 'Estonia', 'Georgia',
                                               'Kazakhstan', 'Kyrgyzstan', 'Latvia', 'Lithuania',
                                               'Montenegro', 'Republic of Moldova', 'Russia',
                                               'Republic of Serbia', 'Timor-Leste', 'Turkmenistan', 'Ukraine',
                                               'Uzbekistan'],
           'Ethiopia PDR':                     ['Eritrea','Ethiopia'],
           'Yugoslav SFR':                     ['Kosovo', 'Slovenia', 'Croatia',
                                                'Macedonia', 'Bosnia and Herzegovina'],
           'Yemen Dem' :                       ['Yemen'],        
           'Czechoslovakia':                   ["Czech Republic", 'Slovakia'],
           'Netherlands Antilles (former)':    ['Curaçao', 'Sint Maarten (Dutch Part)'],
           'Sudan (former)':                   ['South Sudan', 'Sudan']
          }
    def add_new_names(old_name):
        if old_name in dic.keys() :
            return dic[old_name]
        return old_name
    to_plot=df[df["Year"]==year]
    to_plot=(to_plot[['Area','Value']]
             .dropna()
             .groupby('Area')             
             .mean()
             .reset_index()
             .dropna())    
    to_plot['Area']=to_plot['Area'].apply(add_new_names)
    to_plot = to_plot.explode('Area')
    if logScale :
        to_plot.Value=np.log10(to_plot.Value)
    
    m = folium.Map(location=[40,-10],zoom_start=1.6)
    folium.Choropleth(
        geo_data=f"https://raw.githubusercontent.com/python-visualization/folium/master/examples/data/world-countries.json",
        data=to_plot,
        columns=['Area', 'Value'],
        key_on='feature.properties.name',
        fill_color='YlGn',fill_opacity=0.7,line_opacity=0.2,nan_fill_opacity=0.0
    ).add_to(m)

    folium.LayerControl().add_to(m)

    return(m)

We can know use it to produce some maps. For instance, we plot below the map of GDP for the year 1985 (befor edissolution of USSR).

In [47]:
display(visualise_world_data_folium(df_useful["GDP"], 1985, True))

1.E.b. Normalization and log scales

Some of our features seem to be right skewed. At first glance it seems that they look like power laws.

For instance the distribution of GDP look a bit like a power law:

In [48]:
sns.distplot(df_useful["GDP"]["Value"], rug=False, hist=False)
Out[48]:
<matplotlib.axes._subplots.AxesSubplot at 0x1bcb088ee48>

As we later want to train some Machine Learning models, we log those values so that their distribution look a bit more like a normal distribution.

In [49]:
#looks better with log scale
sns.distplot(np.log(df_useful["GDP"]["Value"]), rug=False, hist=False)
Out[49]:
<matplotlib.axes._subplots.AxesSubplot at 0x1bcb0948ac8>

The new distribution indeed looks better to train models on it.

1.F. Making one uniformized dataframe

In this part, we will make one uniformized dataframe uni_df with the following columns.

Country | Year | GDP | Crops production columns | Livestock production columns | Crops importation columns | Livestock importation columns | Crops exportation columns | Livestock exportation columns | CPI

In this uniformized dataframe, a tuple (Country, Year) uniquely identifies a row.

1.F.a. Pivoting dataframes with items

The current dataframes have several rows for a given (Country, Year). Each of this row correspond to one item. We would like to have a unique row for a given (Country, Year) and one column per item:

In [50]:
need_pivot = ['Crops Area harvested',
              'Crops Production',
              'Crops Seed',
              'Crops Yield',
              'Livestock production',
              'Live animals import quantities',
              'Live animals export quantities',
              'Food import quantities',
              'Food export quantities']

def rename_columns(x, word):
    if x not in ['Area', 'Year', 'ha', 'tonnes', 'hg/ha', 'Head', '1000 Head']:
        return x + ' ' + word
    return x

df_useful['GDP'] = df_useful['GDP'].rename(columns = {'Value':'(GDP, million $)'})[["Area",'Year','(GDP, million $)']]
df_useful['Consumer price indices'] = df_useful['Consumer price indices'].rename(columns = {'Value':'(Consumer price indices, %)'})[["Area",'Year','(Consumer price indices, %)']]

for df_name in need_pivot :
    df_useful[df_name] = pd.pivot_table(df_useful[df_name], index=["Area",'Year'], columns=["Item","Unit"], values="Value").rename(columns=lambda x: rename_columns(x, df_name))
    display(df_useful[df_name].sample(5))
Item Anise, badian, fennel, coriander Crops Area harvested Apples Crops Area harvested Apricots Crops Area harvested Areca nuts Crops Area harvested Artichokes Crops Area harvested Asparagus Crops Area harvested Avocados Crops Area harvested Bambara beans Crops Area harvested Bananas Crops Area harvested Barley Crops Area harvested ... Sweet potatoes Crops Area harvested Tangerines, mandarins, clementines, satsumas Crops Area harvested Taro (cocoyam) Crops Area harvested Tomatoes Crops Area harvested Tung nuts Crops Area harvested Vegetables&Melons, Total Crops Area harvested Vetches Crops Area harvested Watermelons Crops Area harvested Wheat Crops Area harvested Yams Crops Area harvested
Unit ha ha ha ha ha ha ha ha ha ha ... ha ha ha ha ha ha ha ha ha ha
Area Year
Suriname 1998 NaN NaN NaN NaN NaN NaN NaN NaN 2152.0 NaN ... 38.0 NaN NaN 109.0 NaN 1698.0 NaN 163.0 NaN NaN
Micronesia (Federated States of) 1995 NaN NaN NaN NaN NaN NaN NaN NaN 385.0 NaN ... 510.0 NaN NaN NaN NaN 280.0 NaN NaN NaN NaN
Dominican Republic 1972 NaN NaN NaN NaN NaN NaN NaN NaN 20000.0 NaN ... 12000.0 NaN NaN 3100.0 NaN 14782.0 NaN NaN NaN 3500.0
Botswana 1998 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN 3255.0 NaN NaN 300.0 NaN
Western Asia 1991 22738.0 158320.0 70627.0 NaN 2354.0 49.0 8155.0 NaN 16335.0 7214987.0 ... 155.0 38338.0 239.0 282099.0 NaN 1475766.0 305907.0 243521.0 13851248.0 NaN

5 rows × 120 columns

Item Anise, badian, fennel, coriander Crops Production Apples Crops Production Apricots Crops Production Areca nuts Crops Production Artichokes Crops Production Asparagus Crops Production Avocados Crops Production Bambara beans Crops Production Bananas Crops Production Barley Crops Production ... Sweet potatoes Crops Production Tangerines, mandarins, clementines, satsumas Crops Production Taro (cocoyam) Crops Production Tomatoes Crops Production Tung nuts Crops Production Vegetables&Melons, Total Crops Production Vetches Crops Production Watermelons Crops Production Wheat Crops Production Yams Crops Production
Unit tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes ... tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes
Area Year
Bolivia 1967 NaN 17200.0 NaN NaN NaN NaN 1830.0 NaN 182500.0 56000.0 ... 11100.0 17000.0 NaN 71700.0 NaN 315600.0 NaN NaN 27000.0 NaN
Montserrat 1970 NaN NaN NaN NaN NaN NaN NaN NaN 120.0 NaN ... 7.0 NaN NaN 85.0 NaN 213.0 NaN NaN NaN NaN
Niue 1962 NaN NaN NaN NaN NaN NaN NaN NaN 150.0 NaN ... 380.0 NaN 700.0 NaN NaN 105.0 NaN NaN NaN 100.0
Cameroon 2013 NaN NaN 854.0 NaN NaN NaN 71776.0 36639.0 1600231.0 NaN ... 348618.0 NaN 1550973.0 875700.0 NaN 2497412.0 NaN 48266.0 850.0 556647.0
Malaysia 2012 NaN NaN NaN 338.0 NaN NaN NaN NaN 289034.0 NaN ... 55838.0 NaN NaN 129572.0 NaN 1281247.0 NaN 225681.0 NaN NaN

5 rows × 122 columns

Item Anise, badian, fennel, coriander Crops Seed Bambara beans Crops Seed Bananas Crops Seed Barley Crops Seed Beans, dry Crops Seed Broad beans, horse beans, dry Crops Seed Buckwheat Crops Seed Cabbages and other brassicas Crops Seed Carrots and turnips Crops Seed Cassava Crops Seed ... Sorghum Crops Seed Soybeans Crops Seed Sugar cane Crops Seed Sweet potatoes Crops Seed Taro (cocoyam) Crops Seed Vegetables&Melons, Total Crops Seed Vetches Crops Seed Watermelons Crops Seed Wheat Crops Seed Yams Crops Seed
Unit tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes ... tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes
Area Year
Asia 2002 4.0 NaN NaN 1606031.0 591634.0 130678.0 90718.0 NaN NaN NaN ... 333309.0 1481863.0 21323856.0 207713.0 36085.0 31169.0 27292.0 16036.0 13877927.0 14570.0
Niger 1981 NaN 0.0 NaN NaN 0.0 NaN NaN NaN NaN NaN ... 22692.0 NaN NaN NaN NaN NaN NaN NaN 105.0 NaN
Polynesia 2013 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN 26.0 NaN NaN NaN NaN
Burkina Faso 1979 NaN 360.0 NaN NaN NaN NaN NaN NaN NaN NaN ... 17221.0 0.0 10800.0 NaN NaN NaN NaN NaN NaN NaN
Cyprus 1975 NaN NaN NaN 5966.0 46.0 279.0 NaN NaN NaN NaN ... NaN NaN NaN NaN 762.0 NaN 692.0 NaN 4376.0 NaN

5 rows × 52 columns

Item Anise, badian, fennel, coriander Crops Yield Apples Crops Yield Apricots Crops Yield Areca nuts Crops Yield Artichokes Crops Yield Asparagus Crops Yield Avocados Crops Yield Bambara beans Crops Yield Bananas Crops Yield Barley Crops Yield ... Sweet potatoes Crops Yield Tangerines, mandarins, clementines, satsumas Crops Yield Taro (cocoyam) Crops Yield Tomatoes Crops Yield Tung nuts Crops Yield Vegetables&Melons, Total Crops Yield Vetches Crops Yield Watermelons Crops Yield Wheat Crops Yield Yams Crops Yield
Unit hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha ... hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha
Area Year
Vanuatu 1966 NaN NaN NaN NaN NaN NaN NaN NaN 122222.0 NaN ... NaN NaN NaN NaN NaN 150000.0 NaN NaN NaN NaN
China 2011 12147.0 165269.0 37099.5 28142.0 68182.0 54923.5 65625.0 NaN 268758.5 31996.0 ... 216457.0 92973.0 183519.0 514577.0 27703.0 233061.5 NaN 381508.5 48375.0 NaN
Central America 1994 74257.0 79997.0 33861.0 NaN 73462.0 37113.0 87440.0 NaN 343514.0 26377.0 ... 176489.0 114777.0 126496.0 184045.0 NaN 132865.0 146595.0 166800.0 42703.0 70729.0
Niger 1962 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 124029.0 NaN NaN 100000.0 NaN 112269.0 NaN NaN 12004.0 NaN
Least Developed Countries 1976 118571.0 51759.0 78756.0 7290.0 NaN NaN 185068.0 7710.0 62172.0 10788.0 ... 49189.0 71734.0 46220.0 82218.0 5414.0 60571.0 5270.0 150724.0 11686.0 66938.0

5 rows × 120 columns

Item Animals live nes Livestock production Asses Livestock production Beehives Livestock production Buffaloes Livestock production Camelids, other Livestock production Camels Livestock production Cattle Livestock production Cattle and Buffaloes Livestock production Chickens Livestock production Ducks Livestock production ... Horses Livestock production Mules Livestock production Pigeons, other birds Livestock production Pigs Livestock production Poultry Birds Livestock production Rabbits and hares Livestock production Rodents, other Livestock production Sheep Livestock production Sheep and Goats Livestock production Turkeys Livestock production
Unit Head Head No Livestock production Head Head Head Head Head 1000 Head 1000 Head ... Head Head 1000 Head Head 1000 Head 1000 Head 1000 Head Head Head 1000 Head
Area Year
Albania 1980 NaN 80000.0 110000.0 1700.0 NaN NaN 606000.0 607700.0 3492.0 NaN ... 45800.0 19500.0 NaN 174300.0 3492.0 NaN NaN 1254000.0 2065000.0 NaN
Paraguay 1965 NaN 21403.0 69000.0 NaN NaN NaN 4400000.0 4400000.0 6296.0 261.0 ... 380000.0 11888.0 NaN 810000.0 6635.0 NaN NaN 438000.0 486000.0 43.0
Cabo Verde 2013 NaN 15300.0 NaN NaN NaN NaN 22752.0 22752.0 1117.0 NaN ... 530.0 1860.0 NaN 84559.0 1117.0 NaN NaN 11599.0 201478.0 NaN
South Africa 1965 NaN 330000.0 48000.0 NaN NaN NaN 12500000.0 12500000.0 20500.0 168.0 ... 360000.0 43000.0 NaN 1400000.0 20984.0 NaN NaN 37000000.0 42341000.0 220.0
Kenya 1998 NaN NaN 2470000.0 NaN NaN 799500.0 11687000.0 11687000.0 29718.0 NaN ... 2100.0 NaN NaN 358462.0 29718.0 339.0 NaN 7043582.0 16717963.0 NaN

5 rows × 22 columns

Item Animals live nes Live animals import quantities Asses Live animals import quantities Beehives Live animals import quantities Bovine, Animals Live animals import quantities Buffaloes Live animals import quantities Camelids, other Live animals import quantities Camels Live animals import quantities Cattle Live animals import quantities Chickens Live animals import quantities ... Mules Live animals import quantities Pigeons, other birds Live animals import quantities Pigs Live animals import quantities Rabbits and hares Live animals import quantities Rodents, other Live animals import quantities Sheep Live animals import quantities Sheep and Goats Live animals import quantities Turkeys Live animals import quantities
Unit Head Head No Live animals import quantities Head Head Head Head Head 1000 Head Head ... Head 1000 Head Head Head 1000 Head 1000 Head Head Head 1000 Head Head
Area Year
Kuwait 1986 NaN NaN NaN 6435.0 NaN NaN 0.0 6435.0 2783.0 NaN ... NaN NaN NaN NaN NaN NaN 2412062.0 2412062.0 NaN NaN
Saudi Arabia 2002 NaN 0.0 NaN 38869.0 NaN NaN 20600.0 38869.0 1243.0 NaN ... 0.0 0.0 NaN NaN 0.0 NaN 5342806.0 5588326.0 NaN NaN
Europe 2002 0.0 4832.0 387.0 3319935.0 0.0 NaN NaN 3319935.0 557143.0 NaN ... 58136.0 51.0 NaN 9873614.0 4185.0 NaN 3834762.0 3881658.0 22224.0 NaN
Spain 1984 NaN 0.0 NaN 35918.0 NaN NaN NaN 35918.0 1384.0 NaN ... 11.0 0.0 NaN 1110.0 0.0 NaN 798.0 1036.0 0.0 NaN
Angola 1990 NaN NaN NaN 0.0 NaN NaN NaN 0.0 90.0 NaN ... NaN NaN NaN 0.0 NaN NaN 0.0 0.0 NaN NaN

5 rows × 24 columns

Item Animals live nes Live animals export quantities Asses Live animals export quantities Beehives Live animals export quantities Bovine, Animals Live animals export quantities Buffaloes Live animals export quantities Camelids, other Live animals export quantities Camels Live animals export quantities Cattle Live animals export quantities Chickens Live animals export quantities ... Mules Live animals export quantities Pigeons, other birds Live animals export quantities Pigs Live animals export quantities Rabbits and hares Live animals export quantities Rodents, other Live animals export quantities Sheep Live animals export quantities Sheep and Goats Live animals export quantities Turkeys Live animals export quantities
Unit Head Head No Live animals export quantities Head Head Head Head Head 1000 Head Head ... Head 1000 Head Head Head 1000 Head 1000 Head Head Head 1000 Head Head
Area Year
Burkina Faso 1963 NaN NaN NaN 112234.0 NaN NaN NaN 112234.0 963.0 NaN ... NaN NaN NaN 270.0 NaN NaN 153656.0 267656.0 NaN NaN
Algeria 1996 0.0 NaN 0.0 NaN NaN NaN NaN NaN 0.0 NaN ... 0.0 NaN NaN 0.0 NaN NaN 0.0 0.0 NaN NaN
Guinea 1978 NaN NaN NaN 30000.0 NaN NaN NaN 30000.0 NaN NaN ... NaN NaN NaN NaN NaN NaN 27000.0 49000.0 NaN NaN
Cuba 1987 NaN NaN NaN 999.0 NaN NaN NaN 999.0 652.0 NaN ... NaN NaN NaN 0.0 NaN NaN NaN NaN NaN NaN
Saint Kitts and Nevis 1988 NaN NaN NaN 0.0 NaN NaN NaN 0.0 NaN NaN ... NaN NaN NaN 0.0 NaN NaN 150.0 150.0 NaN NaN

5 rows × 24 columns

Item Alfalfa meal and pellets Food import quantities Almonds shelled Food import quantities Animal Oil+Fat+Grs Food import quantities Animal Vegetable Oil Food import quantities Animal fats Food import quantities Anise, badian, fennel, coriander Food import quantities Apples Food import quantities Apricots Food import quantities Apricots, dry Food import quantities Artichokes Food import quantities ... Wheat+Flour,Wheat Equivalent Food import quantities Whey, Pres+Concen Food import quantities Whey, condensed Food import quantities Whey, dry Food import quantities Wine Food import quantities Wine+Vermouth+Sim. Food import quantities Wool, degreased Food import quantities Wool, greasy Food import quantities Wool, hair waste Food import quantities Yoghurt, concentrated or not Food import quantities
Unit tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes ... tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes
Area Year
Gabon 2011 0.0 2.0 10.0 29015.0 10.0 0.0 2513.0 0.0 0.0 1.0 ... 94575.0 NaN NaN NaN 9220.0 9368.0 NaN NaN 14.0 72.0
Iran 1973 NaN NaN 17991.0 133328.0 17991.0 17.0 5758.0 NaN NaN NaN ... 785270.0 0.0 NaN 0.0 29.0 31.0 7935.0 4839.0 25.0 NaN
Guam 1967 NaN NaN NaN 0.0 NaN NaN 362.0 NaN NaN NaN ... 0.0 NaN NaN NaN 0.0 0.0 NaN NaN NaN NaN
Djibouti 2002 NaN NaN 0.0 19068.0 0.0 48.0 144.0 NaN NaN NaN ... 240882.0 NaN NaN NaN 167.0 171.0 NaN 0.0 NaN 1.0
Europe 2009 313921.0 317867.0 1200215.0 23879075.0 1200215.0 52652.0 4343799.0 223886.0 103901.0 43765.0 ... 38606765.0 1358113.0 402664.0 955449.0 5927089.0 6258923.0 97399.0 88950.0 20662.0 1244814.0

5 rows × 454 columns

Item Alfalfa meal and pellets Food export quantities Almonds shelled Food export quantities Animal Oil+Fat+Grs Food export quantities Animal Vegetable Oil Food export quantities Animal fats Food export quantities Anise, badian, fennel, coriander Food export quantities Apples Food export quantities Apricots Food export quantities Apricots, dry Food export quantities Artichokes Food export quantities ... Wheat+Flour,Wheat Equivalent Food export quantities Whey, Pres+Concen Food export quantities Whey, condensed Food export quantities Whey, dry Food export quantities Wine Food export quantities Wine+Vermouth+Sim. Food export quantities Wool, degreased Food export quantities Wool, greasy Food export quantities Wool, hair waste Food export quantities Yoghurt, concentrated or not Food export quantities
Unit tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes ... tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes
Area Year
Niue 2006 NaN NaN NaN 0.0 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Argentina 1986 0.0 0.0 27822.0 1929331.0 27822.0 220.0 133859.0 3.0 76.0 NaN ... 4076307.0 0.0 NaN 0.0 20778.0 20959.0 27663.0 35966.0 4008.0 0.0
Singapore 1996 2303.0 315.0 1363.0 468852.0 1363.0 10772.0 18433.0 12.0 58.0 2.0 ... 128663.0 3868.0 NaN 3868.0 1987.0 2130.0 0.0 8.0 0.0 56.0
Sao Tome and Principe 1990 NaN NaN NaN 0.0 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Malaysia 2013 70.0 71.0 243.0 20654632.0 243.0 636.0 1840.0 0.0 1.0 0.0 ... 85136.0 2297.0 0.0 2297.0 2317.0 2427.0 0.0 0.0 449.0 1080.0

5 rows × 445 columns

Some Nan values have appeared. As we have datasets that have great outliers, we would replace those Nan values by the median of the column:

In [51]:
# Deal with the NaN that appeared
for df_name in df_useful :
    for column in list(df_useful[df_name]):
        if column not in ['Area', 'Year']:
            df_useful[df_name][column].fillna(df_useful[df_name][column].median(), inplace=True)
In [52]:
#removing the multiindex, so that merge is clean with GDP and CPI
for df_name in need_pivot :
    df_useful[df_name].columns = [' '.join([str(_) for _ in v]) for v in df_useful[df_name].columns.values]
    display(df_useful[df_name].sample(5))
    
Anise, badian, fennel, coriander Crops Area harvested ha Apples Crops Area harvested ha Apricots Crops Area harvested ha Areca nuts Crops Area harvested ha Artichokes Crops Area harvested ha Asparagus Crops Area harvested ha Avocados Crops Area harvested ha Bambara beans Crops Area harvested ha Bananas Crops Area harvested ha Barley Crops Area harvested ha ... Sweet potatoes Crops Area harvested ha Tangerines, mandarins, clementines, satsumas Crops Area harvested ha Taro (cocoyam) Crops Area harvested ha Tomatoes Crops Area harvested ha Tung nuts Crops Area harvested ha Vegetables&Melons, Total Crops Area harvested ha Vetches Crops Area harvested ha Watermelons Crops Area harvested ha Wheat Crops Area harvested ha Yams Crops Area harvested ha
Area Year
Occupied Palestinian Territory 1969 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 35395.0 6900.0 233786.5 ... 4576.0 4800.0 3600.0 4760.0 7750.0 0.0 10844.0 7298.5 283291.0 5590.0
Israel 2004 3100.0 4480.0 2300.0 49401.0 670.0 34.0 5100.0 35395.0 2600.0 7380.0 ... 650.0 4900.0 3600.0 5820.0 7750.0 61997.0 10844.0 10760.0 70510.0 5590.0
Peru 2001 3100.0 10601.0 45.0 49401.0 411.0 19038.0 10263.0 35395.0 6900.0 153833.0 ... 15445.0 6831.0 3600.0 7273.0 7750.0 186367.0 10844.0 3049.0 145851.0 5590.0
Panama 1990 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 690.0 35395.0 23000.0 233786.5 ... 4576.0 4800.0 3600.0 1312.0 7750.0 4727.0 10844.0 627.0 283291.0 4318.0
China, Macao SAR 1971 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 35395.0 6900.0 233786.5 ... 4576.0 4800.0 3600.0 4760.0 7750.0 34868.0 10844.0 7298.5 283291.0 5590.0

5 rows × 120 columns

Anise, badian, fennel, coriander Crops Production tonnes Apples Crops Production tonnes Apricots Crops Production tonnes Areca nuts Crops Production tonnes Artichokes Crops Production tonnes Asparagus Crops Production tonnes Avocados Crops Production tonnes Bambara beans Crops Production tonnes Bananas Crops Production tonnes Barley Crops Production tonnes ... Sweet potatoes Crops Production tonnes Tangerines, mandarins, clementines, satsumas Crops Production tonnes Taro (cocoyam) Crops Production tonnes Tomatoes Crops Production tonnes Tung nuts Crops Production tonnes Vegetables&Melons, Total Crops Production tonnes Vetches Crops Production tonnes Watermelons Crops Production tonnes Wheat Crops Production tonnes Yams Crops Production tonnes
Area Year
Asia 1971 58640.0 3854825.0 304927.0 211648.0 17579.0 894532.0 50844.0 25100.0 10156070.0 14895301.0 ... 117991407.0 3482715.0 1683211.0 9128658.0 259000.0 127643958.0 246653.0 12715432.0 85759835.0 135718.0
Malta 1997 2686.0 371.0 27679.0 30102.0 21000.0 7905.5 15422.5 25100.0 78685.0 1507.0 ... 37000.0 43946.0 28678.0 20854.0 11255.0 71820.0 711.0 104279.0 9036.0 33237.0
Sweden 1988 2686.0 130000.0 27679.0 30102.0 21000.0 7905.5 15422.5 25100.0 78685.0 1878700.0 ... 37000.0 43946.0 28678.0 15542.0 11255.0 288883.0 113000.0 104279.0 1295500.0 33237.0
Iceland 1988 2686.0 240418.5 27679.0 30102.0 21000.0 7905.5 15422.5 25100.0 78685.0 471754.0 ... 37000.0 43946.0 28678.0 650.0 11255.0 1665.0 13640.0 104279.0 739850.0 33237.0
Wallis and Futuna Islands 1983 2686.0 240418.5 27679.0 30102.0 21000.0 7905.5 15422.5 25100.0 4000.0 471754.0 ... 37000.0 43946.0 1600.0 75241.0 11255.0 550.0 13640.0 104279.0 739850.0 560.0

5 rows × 122 columns

Anise, badian, fennel, coriander Crops Seed tonnes Bambara beans Crops Seed tonnes Bananas Crops Seed tonnes Barley Crops Seed tonnes Beans, dry Crops Seed tonnes Broad beans, horse beans, dry Crops Seed tonnes Buckwheat Crops Seed tonnes Cabbages and other brassicas Crops Seed tonnes Carrots and turnips Crops Seed tonnes Cassava Crops Seed tonnes ... Sorghum Crops Seed tonnes Soybeans Crops Seed tonnes Sugar cane Crops Seed tonnes Sweet potatoes Crops Seed tonnes Taro (cocoyam) Crops Seed tonnes Vegetables&Melons, Total Crops Seed tonnes Vetches Crops Seed tonnes Watermelons Crops Seed tonnes Wheat Crops Seed tonnes Yams Crops Seed tonnes
Area Year
Jordan 1988 0.0 1268.0 0.0 3361.0 3120.0 8.0 1305.0 4000.0 11669.0 81.5 ... 0.0 1260.0 104890.0 4426.5 4200.0 6488.0 141.0 16036.0 4700.0 8912.0
Namibia 1982 0.0 1268.0 0.0 28600.0 3120.0 1115.5 1305.0 4000.0 11669.0 81.5 ... 620.0 1260.0 104890.0 4426.5 4200.0 6488.0 899.5 16036.0 70.0 8912.0
Romania 2008 0.0 1268.0 0.0 90627.0 3615.0 1115.5 1305.0 4000.0 11669.0 81.5 ... 243.0 3500.0 104890.0 4426.5 4200.0 6488.0 899.5 16036.0 591519.0 8912.0
Central African Republic 1989 0.0 1268.0 0.0 28600.0 3120.0 1115.5 1305.0 4000.0 11669.0 81.5 ... 1041.0 1260.0 1500.0 4426.5 4200.0 6488.0 899.5 16036.0 43746.5 20000.0
Ecuador 2006 0.0 1268.0 0.0 4300.0 2049.0 723.0 1305.0 4000.0 11669.0 81.5 ... 123.0 2600.0 104890.0 4426.5 4200.0 22906.0 899.5 16036.0 2200.0 8912.0

5 rows × 52 columns

Anise, badian, fennel, coriander Crops Yield hg/ha Apples Crops Yield hg/ha Apricots Crops Yield hg/ha Areca nuts Crops Yield hg/ha Artichokes Crops Yield hg/ha Asparagus Crops Yield hg/ha Avocados Crops Yield hg/ha Bambara beans Crops Yield hg/ha Bananas Crops Yield hg/ha Barley Crops Yield hg/ha ... Sweet potatoes Crops Yield hg/ha Tangerines, mandarins, clementines, satsumas Crops Yield hg/ha Taro (cocoyam) Crops Yield hg/ha Tomatoes Crops Yield hg/ha Tung nuts Crops Yield hg/ha Vegetables&Melons, Total Crops Yield hg/ha Vetches Crops Yield hg/ha Watermelons Crops Yield hg/ha Wheat Crops Yield hg/ha Yams Crops Yield hg/ha
Area Year
Fiji 1982 7948.0 100000.0 65534.0 10559.0 104578.0 38812.0 70391.0 7074.0 69333.0 17858.0 ... 26667.0 120000.0 120000.0 164101.0 17350.0 36471.0 9543.0 150061.0 19163.5 140000.0
Bahamas 1992 7948.0 100000.0 65534.0 10559.0 104578.0 38812.0 70391.0 7074.0 136364.0 17858.0 ... 31617.0 120000.0 66323.0 133333.0 17350.0 95753.0 9543.0 150061.0 19163.5 78545.5
Central Asia 2008 7948.0 66426.0 64025.0 10559.0 111765.0 38812.0 70391.0 7074.0 115741.5 10908.0 ... 71071.0 110000.0 66323.0 296059.0 17350.0 240569.0 9543.0 185917.0 14037.0 78545.5
Namibia 1984 7948.0 100000.0 65534.0 10559.0 104578.0 38812.0 70391.0 7074.0 115741.5 17858.0 ... 71071.0 120000.0 66323.0 164101.0 17350.0 46154.0 9543.0 150061.0 29167.0 78545.5
Belgium-Luxembourg 1970 7948.0 630000.0 65534.0 10559.0 104578.0 45455.0 70391.0 7074.0 115741.5 30451.0 ... 71071.0 120000.0 66323.0 916155.0 17350.0 326757.0 9543.0 150061.0 38482.0 78545.5

5 rows × 120 columns

Animals live nes Livestock production Head Asses Livestock production Head Beehives Livestock production No Livestock production Buffaloes Livestock production Head Camelids, other Livestock production Head Camels Livestock production Head Cattle Livestock production Head Cattle and Buffaloes Livestock production Head Chickens Livestock production 1000 Head Ducks Livestock production 1000 Head ... Horses Livestock production Head Mules Livestock production Head Pigeons, other birds Livestock production 1000 Head Pigs Livestock production Head Poultry Birds Livestock production 1000 Head Rabbits and hares Livestock production 1000 Head Rodents, other Livestock production 1000 Head Sheep Livestock production Head Sheep and Goats Livestock production Head Turkeys Livestock production 1000 Head
Area Year
India 2010 532303.0 375000.0 11500000.0 107375000.0 4380000.0 457000.0 194184992.0 301559992.0 663300.0 25300.0 ... 619000.0 175000.0 2885.0 10640000.0 688600.0 427.0 18925.0 67744000.0 205064992.0 570.0
French Polynesia 1992 532303.0 37836.0 2000.0 288416.0 4380000.0 240000.0 7000.0 7000.0 150.0 33.0 ... 2200.0 20000.0 2885.0 36000.0 183.0 0.0 18925.0 350.0 15150.0 570.0
French Guiana 1973 532303.0 37836.0 187000.0 288416.0 4380000.0 240000.0 1600.0 1600.0 70.0 3.0 ... 100.0 20000.0 2885.0 4000.0 73.0 427.0 18925.0 2400.0 3200.0 570.0
Portugal 2014 532303.0 115000.0 333000.0 288416.0 4380000.0 240000.0 1549000.0 1549000.0 45000.0 600.0 ... 20000.0 55000.0 2885.0 2126000.0 51500.0 427.0 18925.0 2032000.0 2414000.0 6500.0
Melanesia 2008 532303.0 37836.0 15200.0 288416.0 4380000.0 240000.0 685500.0 685500.0 10244.0 100.0 ... 64650.0 20000.0 2885.0 2177000.0 10426.0 427.0 18925.0 15300.0 296000.0 82.0

5 rows × 22 columns

Animals live nes Live animals import quantities Head Asses Live animals import quantities Head Beehives Live animals import quantities No Live animals import quantities Bovine, Animals Live animals import quantities Head Buffaloes Live animals import quantities Head Camelids, other Live animals import quantities Head Camels Live animals import quantities Head Cattle Live animals import quantities Head Chickens Live animals import quantities 1000 Head Chickens Live animals import quantities Head ... Mules Live animals import quantities Head Pigeons, other birds Live animals import quantities 1000 Head Pigeons, other birds Live animals import quantities Head Pigs Live animals import quantities Head Rabbits and hares Live animals import quantities 1000 Head Rodents, other Live animals import quantities 1000 Head Sheep Live animals import quantities Head Sheep and Goats Live animals import quantities Head Turkeys Live animals import quantities 1000 Head Turkeys Live animals import quantities Head
Area Year
Eastern Africa 1988 0.0 0.0 0.0 4266.0 0.0 12.0 0.0 4266.0 2952.0 3727.5 ... 0.0 0.0 18.0 3340.0 0.0 77.0 2.0 2.0 9.0 728.5
Caribbean 1979 0.0 1.0 0.0 170.0 0.0 12.0 0.0 170.0 6646.0 3727.5 ... 0.0 0.0 18.0 32.0 0.0 77.0 4330.0 6493.0 0.0 728.5
South Korea 1981 0.0 0.0 0.0 15881.0 0.0 12.0 0.0 15881.0 252.0 3727.5 ... 0.0 0.0 18.0 561.0 0.0 77.0 0.0 0.0 0.0 728.5
New Caledonia 1998 0.0 0.0 0.0 47.0 0.0 12.0 0.0 47.0 33.0 3727.5 ... 0.0 0.0 18.0 0.0 0.0 77.0 20.0 21.0 9.0 728.5
Kenya 1966 0.0 0.0 0.0 40000.0 0.0 12.0 0.0 40000.0 182.0 3727.5 ... 0.0 0.0 18.0 14.0 0.0 77.0 79.0 79.0 9.0 728.5

5 rows × 24 columns

Animals live nes Live animals export quantities Head Asses Live animals export quantities Head Beehives Live animals export quantities No Live animals export quantities Bovine, Animals Live animals export quantities Head Buffaloes Live animals export quantities Head Camelids, other Live animals export quantities Head Camels Live animals export quantities Head Cattle Live animals export quantities Head Chickens Live animals export quantities 1000 Head Chickens Live animals export quantities Head ... Mules Live animals export quantities Head Pigeons, other birds Live animals export quantities 1000 Head Pigeons, other birds Live animals export quantities Head Pigs Live animals export quantities Head Rabbits and hares Live animals export quantities 1000 Head Rodents, other Live animals export quantities 1000 Head Sheep Live animals export quantities Head Sheep and Goats Live animals export quantities Head Turkeys Live animals export quantities 1000 Head Turkeys Live animals export quantities Head
Area Year
China, Taiwan Province of 1990 0.0 0.0 0.0 0.0 0.0 7.0 588.0 0.0 114.0 93192.5 ... 0.0 4.0 89.0 126.0 0.0 0.5 3729.0 0.0 1.0 7609.5
European Union 1975 0.0 904.0 0.0 4042029.0 320.0 7.0 588.0 4041709.0 104634.0 93192.5 ... 827.0 52.0 89.0 3540515.0 8646.0 0.5 4454469.0 4459852.0 1401.0 7609.5
Northern Africa 1993 0.0 0.0 146.0 10020.0 0.0 7.0 224.0 10020.0 9298.0 93192.5 ... 0.0 0.0 89.0 0.0 0.0 0.5 762753.0 792297.0 43.0 7609.5
Pakistan 1999 0.0 0.0 0.0 0.0 0.0 7.0 602.0 0.0 522.0 93192.5 ... 0.0 0.0 89.0 73.0 0.0 0.5 1630.0 7517.0 1.0 7609.5
European Union 1962 0.0 1527.0 0.0 1403274.0 0.0 7.0 588.0 1403274.0 33177.0 93192.5 ... 4408.0 0.0 89.0 1212495.0 28.0 0.5 1208873.0 1209299.0 0.0 7609.5

5 rows × 24 columns

Alfalfa meal and pellets Food import quantities tonnes Almonds shelled Food import quantities tonnes Animal Oil+Fat+Grs Food import quantities tonnes Animal Vegetable Oil Food import quantities tonnes Animal fats Food import quantities tonnes Anise, badian, fennel, coriander Food import quantities tonnes Apples Food import quantities tonnes Apricots Food import quantities tonnes Apricots, dry Food import quantities tonnes Artichokes Food import quantities tonnes ... Wheat+Flour,Wheat Equivalent Food import quantities tonnes Whey, Pres+Concen Food import quantities tonnes Whey, condensed Food import quantities tonnes Whey, dry Food import quantities tonnes Wine Food import quantities tonnes Wine+Vermouth+Sim. Food import quantities tonnes Wool, degreased Food import quantities tonnes Wool, greasy Food import quantities tonnes Wool, hair waste Food import quantities tonnes Yoghurt, concentrated or not Food import quantities tonnes
Area Year
Maldives 2001 46.0 2.0 2199.5 5130.0 2199.5 144.0 888.0 1.0 2.0 1.0 ... 21770.0 133.5 0.0 186.0 484.0 488.0 399.0 144.0 16.0 19.5
Chad 1963 46.0 93.5 2199.5 36.0 2199.5 167.0 66.0 21.0 22.0 2.0 ... 4445.0 133.5 0.0 186.0 1252.0 1252.0 399.0 144.0 16.0 19.5
China 1961 0.0 0.0 15200.0 32163.0 15200.0 179.0 8945.0 0.0 0.0 0.0 ... 4382573.0 0.0 0.0 0.0 465.0 0.0 255.0 21600.0 40.0 0.0
Falkland Islands (Malvinas) 1989 46.0 93.5 2199.5 28077.5 2199.5 167.0 1835.0 21.0 22.0 2.0 ... 118744.5 133.5 0.0 186.0 1000.0 1055.0 399.0 144.0 16.0 19.5
Cameroon 2003 46.0 11.0 13.0 43387.0 13.0 5.0 1123.0 0.0 0.0 2.0 ... 276697.0 119.0 119.0 186.0 6544.0 6588.0 399.0 144.0 2.0 30.0

5 rows × 454 columns

Alfalfa meal and pellets Food export quantities tonnes Almonds shelled Food export quantities tonnes Animal Oil+Fat+Grs Food export quantities tonnes Animal Vegetable Oil Food export quantities tonnes Animal fats Food export quantities tonnes Anise, badian, fennel, coriander Food export quantities tonnes Apples Food export quantities tonnes Apricots Food export quantities tonnes Apricots, dry Food export quantities tonnes Artichokes Food export quantities tonnes ... Wheat+Flour,Wheat Equivalent Food export quantities tonnes Whey, Pres+Concen Food export quantities tonnes Whey, condensed Food export quantities tonnes Whey, dry Food export quantities tonnes Wine Food export quantities tonnes Wine+Vermouth+Sim. Food export quantities tonnes Wool, degreased Food export quantities tonnes Wool, greasy Food export quantities tonnes Wool, hair waste Food export quantities tonnes Yoghurt, concentrated or not Food export quantities tonnes
Area Year
World 2006 1082135.0 405469.0 3211033.0 70391448.0 3211033.0 263083.0 6999767.0 248759.0 136791.0 55363.0 ... 139625731.0 2102792.0 294588.0 1808204.0 8369330.0 8768568.0 309204.0 653526.0 41984.0 1203883.0
Iraq 2004 32.0 9.0 223.0 33.0 223.0 318.0 0.0 61.0 19.0 1.0 ... 237354.0 20.0 0.0 25.0 178.0 195.0 686.0 960.0 0.0 1.0
Niue 1998 32.0 9.0 223.0 0.0 223.0 36.5 382.0 61.0 19.0 1.0 ... 2511.0 20.0 0.0 25.0 178.0 195.0 316.0 510.0 131.0 1.0
Bermuda 1996 32.0 9.0 223.0 8632.0 223.0 36.5 382.0 61.0 19.0 1.0 ... 2511.0 20.0 0.0 25.0 178.0 195.0 316.0 510.0 131.0 1.0
Hungary 1992 27220.0 23.0 28773.0 246844.0 28773.0 1004.0 199524.0 5262.0 19.0 1.0 ... 1277206.0 1789.0 0.0 1789.0 78407.0 79747.0 707.0 5590.0 305.0 39.0

5 rows × 445 columns

1.F.b. Merging everything

We are now creating the uniformized dataframe uni_df. Each row corresponds to one tuple (Country, Year) so that we can later group by country or year. In addition to the country ("Area") and the "Year", the columns will either be an economic feature ("GDP", "CPI") or an agricultural one (some crop harvested area, some livestock export quantity, ...)

In [53]:
uni_df = df_useful['GDP']
for df_name in need_pivot :
    uni_df = pd.merge(uni_df, df_useful[df_name], how='outer', on=['Area', 'Year'])
uni_df = pd.merge(uni_df,df_useful['Consumer price indices'], how='outer', on=['Area', 'Year'])

# Deal with the NaN that appeared
for column in list(uni_df):
    if column not in ['Area', 'Year']:
        uni_df[column].fillna(uni_df[column].median(), inplace=True)
uni_df.sample(30)
Out[53]:
Area Year (GDP, million $) Anise, badian, fennel, coriander Crops Area harvested ha Apples Crops Area harvested ha Apricots Crops Area harvested ha Areca nuts Crops Area harvested ha Artichokes Crops Area harvested ha Asparagus Crops Area harvested ha Avocados Crops Area harvested ha ... Whey, Pres+Concen Food export quantities tonnes Whey, condensed Food export quantities tonnes Whey, dry Food export quantities tonnes Wine Food export quantities tonnes Wine+Vermouth+Sim. Food export quantities tonnes Wool, degreased Food export quantities tonnes Wool, greasy Food export quantities tonnes Wool, hair waste Food export quantities tonnes Yoghurt, concentrated or not Food export quantities tonnes (Consumer price indices, %)
10438 Americas 2007 1.988917e+07 26853.0 347171.0 9983.0 49401.0 21670.0 59387.0 259629.0 ... 493544.0 5957.0 487587.0 1408037.0 1429348.0 15750.0 53363.0 10277.0 16616.0 97.895927
5539 Marshall Islands 1984 4.555637e+01 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 178.0 195.0 316.0 510.0 131.0 1.0 97.895927
14987 South Africa 2016 1.008410e+04 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 178.0 195.0 316.0 510.0 131.0 1.0 117.388959
8391 Sudan 1984 1.008410e+04 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 178.0 195.0 316.0 510.0 131.0 1.0 97.895927
695 Bangladesh 1975 8.475804e+03 3100.0 21040.0 5000.0 37336.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 178.0 195.0 316.0 510.0 131.0 1.0 97.895927
5154 Liechtenstein 2013 6.391711e+03 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 178.0 195.0 316.0 510.0 131.0 1.0 97.895927
5912 Montserrat 1989 6.368785e+01 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 178.0 195.0 316.0 510.0 131.0 1.0 97.895927
4768 Kuwait 1995 2.655406e+04 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 178.0 195.0 316.0 2143.0 131.0 3.0 97.895927
12324 Equatorial Guinea 1964 1.008410e+04 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 178.0 195.0 316.0 510.0 131.0 1.0 97.895927
6814 Papua New Guinea 1971 1.217970e+03 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 0.0 0.0 316.0 510.0 131.0 1.0 97.895927
3267 Finland 2012 2.567065e+05 3100.0 667.0 5000.0 49401.0 2800.0 22.0 2729.0 ... 36397.0 0.0 36397.0 1394.0 1888.0 0.0 0.0 0.0 32758.0 101.420965
13763 South America 1964 1.008410e+04 0.0 24744.0 800.0 49401.0 4760.0 1473.0 33128.0 ... 0.0 0.0 0.0 8454.0 8491.0 21845.0 120805.0 2554.0 0.0 97.895927
1820 Chile 1996 7.797216e+04 3100.0 34800.0 2130.0 49401.0 2480.0 4105.0 15050.0 ... 56.0 0.0 56.0 203369.0 203922.0 67.0 2212.0 194.0 2217.0 97.895927
4856 Laos 1991 1.028365e+03 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 178.0 195.0 316.0 510.0 131.0 1.0 97.895927
7511 Saint Vincent and the Grenadines 1978 5.372638e+01 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 178.0 195.0 316.0 510.0 131.0 1.0 97.895927
4170 Indonesia 1995 2.364560e+05 3100.0 21040.0 5000.0 74838.0 2800.0 2173.0 19377.0 ... 0.0 0.0 0.0 178.0 0.0 6.0 7.0 11.0 11.0 97.895927
1894 China, Macao SAR 1978 7.403681e+02 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 0.0 0.0 0.0 0.0 0.0 1.0 97.895927
4958 Lebanon 2001 1.706534e+04 55.0 9460.0 5781.0 49401.0 80.0 2173.0 286.0 ... 20.0 0.0 25.0 1074.0 1127.0 0.0 269.0 0.0 1.0 97.895927
14904 Myanmar 2016 1.008410e+04 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 178.0 195.0 316.0 510.0 131.0 1.0 139.323031
2633 Denmark 1976 4.457566e+04 519.0 21040.0 5000.0 49401.0 2800.0 300.0 2729.0 ... 27.0 0.0 27.0 1579.0 1738.0 82.0 135.0 0.0 0.0 97.895927
1414 Burkina Faso 2004 4.838551e+03 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 19.0 28.0 316.0 510.0 131.0 1.0 76.526778
5000 Lesotho 1997 8.597294e+02 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 178.0 195.0 316.0 3300.0 131.0 1.0 97.895927
9319 Ukraine 1992 8.135948e+04 1600.0 355500.0 26000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 44000.0 44000.0 316.0 330.0 131.0 1.0 97.895927
5846 Mongolia 2015 1.175794e+04 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 178.0 195.0 316.0 510.0 131.0 1.0 160.273580
6487 Niger 2012 6.942209e+03 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 0.0 0.0 0.0 2.0 2.0 316.0 510.0 131.0 0.0 110.773349
13379 Pacific Islands Trust Territory 1984 1.008410e+04 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 178.0 195.0 316.0 510.0 131.0 1.0 97.895927
8759 Thailand 1984 4.323649e+04 3100.0 21040.0 5000.0 9600.0 2800.0 0.0 2729.0 ... 0.0 0.0 0.0 3.0 3.0 0.0 510.0 0.0 0.0 97.895927
10927 Europe 1990 8.726827e+06 11751.0 2261092.0 144104.0 49401.0 97542.0 64736.0 18067.0 ... 544587.0 41674.0 502913.0 3848227.0 4110805.0 76768.0 59830.0 46440.0 225868.0 97.895927
11103 Western Europe 1982 1.798503e+06 1623.0 82552.0 12500.0 49401.0 14500.0 23206.0 2729.0 ... 337736.0 0.0 337736.0 1182909.0 1247329.0 23764.0 32519.0 26501.0 0.0 97.895927
12138 China, Taiwan Province of 1994 1.008410e+04 3100.0 1070.0 9079.0 36316.0 2800.0 1471.0 2729.0 ... 0.0 0.0 0.0 7.0 7.0 9112.0 29.0 417.0 2.0 97.895927

30 rows × 1387 columns

2.A.a Crops and livestock production and trade

TODO

2.A.b Introducing the concept of food self-sufficiency

In this section we will present and compute the notion of food self-sufficiency.

2.A.b.i Basic idea

One may wonder how to know whether a country produce all the food it needs or not. The notion of food-self-sufficency allows to answer to this question. More formally, it is a rate that decribes how much does a country can satisfy to meet its internal consumption needs by production. It describes the extent to which a country is able to feed its population through its domestic food production. We are interested into this measure since we think it could be correlated with the economic conditions of this country.

2.A.b.ii Formula and computation

In order to compute the food self-sufficiency, we will apply the following formula that gives us the food self-sudfficiency as a percentage :

$$\frac{Production \times 100}{Production + Imports – Exports}$$
In [54]:
all_columns = list(uni_df)
production_columns = []
import_columns = []
export_columns = []
for column in all_columns:
    if (type(column)==tuple) and column[1]=='tonnes':
        if 'export quantities' in column[0]:
            export_columns.append(column)
        elif 'import quantities' in column[0]:
            import_columns.append(column)
        elif 'Production' in column[0]:
            production_columns.append(column)
            
uni_df[('All productions','tonnes')] = 0
for column in production_columns :
    uni_df[('All productions','tonnes')] += uni_df[column]            
uni_df[('All imports','tonnes')] = 0
for column in import_columns :
    uni_df[('All imports','tonnes')] += uni_df[column]            
uni_df[('All exports','tonnes')] = 0
for column in export_columns :
    uni_df[('All exports','tonnes')] += uni_df[column]
            
uni_df[('food self-sufficiency','%')] = 100 * uni_df[('All productions','tonnes')] / (uni_df[('All productions','tonnes')]+uni_df[('All imports','tonnes')]+uni_df[('All exports','tonnes')])
In [55]:
display(uni_df[['Area','Year',('food self-sufficiency','%')]].sample(5))
Area Year (food self-sufficiency, %)
14337 Falkland Islands (Malvinas) 2005 NaN
7745 Senegal 1982 NaN
4161 Indonesia 1986 NaN
4503 Japan 2006 NaN
11471 Land Locked Developing Countries 1982 NaN
In [56]:
plot = uni_df[['Area','Year']]
plot["Value"] = uni_df[('food self-sufficiency','%')]
for year in range(1980, 2010, 5):
    display(year, visualise_world_data_folium(plot, year, False))
    
1980
1985
1990
1995
2000
2005

2.B. Consumer price indices

      1. Definition

      2. Usage

2.B.a. Definition

Consumer price indices (CPI) are a way to measure the changes of the average price level of goods. One can use it to measure inflation or deflation over some period. More technically, for a given item, the CPI is the ratio of the market basket fo two different years. Global CPI is an average of sigle item CPI with some standardized weights.

2.B.b. Usage

The CPI have many usages and is often taken into consideration. For instance it is used for budget and pensions revisions, monetary and economic policies, and economic analysis .

We will use the CPI to answer the following questions: "Are prices more stables in more self-sufficient countries ?", "Is there a link between the CPI and the agricultural features ?"

2.C. Structure of international trade and historical context

Our dataset contains data for the historical period from 1970 to 2015. In order to be able to correctly interpret the results we are going to see, we first made an historical research on this period. We shortly listed below important events of this period for which we think they have had a significant influence on the agriculture and the economy.

There was the Cold war from 1945 to 1990 with two economic superpowers (USA and USSR). The USSR had been dissoluted in 1991. The Japanese economic miracle occured from 1945 to 1990 and allowed Japan to come out of the disastrous state in which it was at the exit of the WW2 and become world's second largest economy. There has been 2 big oil crisis in 1973 and 1979. There has been many wars (Middle East wars 1973-2000 e.g. Yom Kippur War 1973, Islamic Revolution in Iran 1979, Iran–Iraq war 1980-1988, Gulf war 1990-1991, Yugoslav wars 1991-2001...). We have already seen some consequences of such events by dealing with countries names in a previous section.

The third Agricultural Revolution (also known as Green revolution) occurs form 1960 to 1990 and imporved agricultural productions thanks to fertilizers and chemicals.

The following public-domain image from Wikimedia represents developed countries (blue), developing ones (orange) and least developed ones (red) according to the United Nations and International Monetary Fund. We expect to see similar results with our dataset (GDP).

The following image, also from Wikimedia shows the cumulative commercial balance for the period 1980-2008. We also expect to see similar results with our dataset, but there might be difference as we focus on agriculture.

In order to have an idea of the international trade and economy structure, we are interested in GDP:

In [57]:
pivoted_GDP_df = uni_df[['Area','Year']]
pivoted_GDP_df["GDP"] = uni_df["(GDP, million $)"]
pivoted_GDP_df = pivoted_GDP_df.pivot_table(index='Year', columns='Area', values="GDP").dropna(axis=1)
In [58]:
pivoted_GDP_df.sample(5)
Out[58]:
Area Afghanistan Albania Algeria Angola Argentina Aruba Australia Austria Bahrain Bangladesh ... Turkey Uganda United Arab Emirates United Kingdom United Republic of Tanzania United States of America Uruguay Vietnam Zambia Zimbabwe
Year
1972 1595.555482 2397.526448 7176.428384 4102.154582 37907.787897 195.963605 59269.261908 22007.036968 575.580428 6215.380293 ... 28974.911661 1711.626558 2173.209706 1.699691e+05 2965.929453 1282400.0 2450.067269 3236.929352 1851.647204 2888.078733
1995 3236.300000 2392.764887 41971.488420 6642.229474 279701.425370 1320.670391 392103.229540 240473.745470 6786.968612 37866.029743 ... 227606.778210 7146.007870 65743.604966 1.320322e+06 7760.794768 7664060.0 21312.174665 20736.258012 3806.998070 9576.473659
1961 10084.100000 10084.100000 10084.100000 10084.100000 10084.100000 10084.100000 10084.100000 10084.100000 10084.100000 10084.100000 ... 10084.100000 10084.100000 10084.100000 1.008410e+04 10084.100000 10084.1 10084.100000 10084.100000 10084.100000 10084.100000
1965 10084.100000 10084.100000 10084.100000 10084.100000 10084.100000 10084.100000 10084.100000 10084.100000 10084.100000 10084.100000 ... 10084.100000 10084.100000 10084.100000 1.008410e+04 10084.100000 10084.1 10084.100000 10084.100000 10084.100000 10084.100000
1971 1831.108982 2330.584860 5363.676160 4007.445465 40655.363076 186.045845 50359.689286 17815.727894 454.985468 5577.824265 ... 23057.152466 1575.302401 1443.434907 1.481066e+05 2605.423796 1167800.0 3048.552419 3023.991667 1574.164973 2338.636138

5 rows × 146 columns

As we can see on a subset of the correlation matrix below, GDP are often hugely correlated between countries.

In [59]:
selected_countries = ['Algeria', 'Australia', 'Austria', 'Bangladesh', 'China',
                      'Djibouti', 'France', 'Germany', 'India', 'Japan', 'Mali',
                      'Switzerland', 'United States of America']

corr = pivoted_GDP_df[selected_countries].corr()
corr.style.background_gradient(cmap='coolwarm')
Out[59]:
Area Algeria Australia Austria Bangladesh China Djibouti France Germany India Japan Mali Switzerland United States of America
Area
Algeria 1 0.980818 0.922843 0.938941 0.927377 -0.342829 0.912461 0.887093 0.972868 0.768559 0.516948 0.948838 0.909623
Australia 0.980818 1 0.951373 0.953673 0.935553 -0.349658 0.938023 0.919684 0.984724 0.816869 0.520149 0.973969 0.937551
Austria 0.922843 0.951373 1 0.895146 0.825707 -0.437165 0.997136 0.992467 0.913345 0.929508 0.402053 0.988033 0.982963
Bangladesh 0.938941 0.953673 0.895146 1 0.982827 -0.249451 0.87078 0.858809 0.985178 0.738215 0.593815 0.938569 0.902974
China 0.927377 0.935553 0.825707 0.982827 1 -0.195531 0.796355 0.777185 0.977373 0.632289 0.62787 0.884897 0.829639
Djibouti -0.342829 -0.349658 -0.437165 -0.249451 -0.195531 1 -0.491127 -0.510712 -0.297009 -0.529611 0.611626 -0.431398 -0.476802
France 0.912461 0.938023 0.997136 0.87078 0.796355 -0.491127 1 0.995472 0.893948 0.937746 0.341338 0.981299 0.981912
Germany 0.887093 0.919684 0.992467 0.858809 0.777185 -0.510712 0.995472 1 0.873454 0.958066 0.307788 0.978105 0.977361
India 0.972868 0.984724 0.913345 0.985178 0.977373 -0.297009 0.893948 0.873454 1 0.748194 0.567755 0.951028 0.911338
Japan 0.768559 0.816869 0.929508 0.738215 0.632289 -0.529611 0.937746 0.958066 0.748194 1 0.189758 0.909274 0.923719
Mali 0.516948 0.520149 0.402053 0.593815 0.62787 0.611626 0.341338 0.307788 0.567755 0.189758 1 0.429605 0.355082
Switzerland 0.948838 0.973969 0.988033 0.938569 0.884897 -0.431398 0.981299 0.978105 0.951028 0.909274 0.429605 1 0.977426
United States of America 0.909623 0.937551 0.982963 0.902974 0.829639 -0.476802 0.981912 0.977361 0.911338 0.923719 0.355082 0.977426 1

The correlation matrix contains lots of values that are very closed to one (red). This is also true for the whole correlation matrix as seen below:

In [60]:
f = plt.figure(figsize=(19, 15))
plt.matshow(pivoted_GDP_df.corr(), fignum=f.number)
cb = plt.colorbar()
cb.ax.tick_params()
plt.title('Correlation Matrix', fontsize=16);

We then try to clusterize this correlation matrix in order to find countries whose GDP are correlated:

In [61]:
corr = pivoted_GDP_df.corr().values
pdist = spc.distance.pdist(corr)   # vector of ('55' choose 2) pairwise distances
linkage = spc.linkage(pdist, method='complete')
ind = spc.fcluster(linkage, 0.5*pdist.max(), 'distance')
columns = [pivoted_GDP_df.columns.tolist()[i] for i in list((np.argsort(ind)))]
clusterised_df = pivoted_GDP_df.reindex(columns, axis=1)

f = plt.figure(figsize=(19, 15))
plt.matshow(clusterised_df.corr(), fignum=f.number)
cb = plt.colorbar()
cb.ax.tick_params()
plt.title('Correlation Matrix', fontsize=16);

We have found 4 regions in which the GDP is highly correlated and between which the correlation coefficent is lower. We could refine the big clusters by iterating this method.

Interpretation: The correlation matrix of GDP contains lots of values that are very closed to one. This means that GDP in two different countries have a trend to evolve the same way. Therefore, we can say that the world countries have strong enough trading relations to make the GDP evolve the same way. The fact that we have found some main clusters could be interpreted as regions in which the trading relations are more important.

2.D. Economic classification of countries

In [62]:
plot = uni_df[['Area','Year']]
plot["Value"] = uni_df["(GDP, million $)"]
for year in range(1980, 2015, 5):
    display(year, visualise_world_data_folium(plot, year, True))
1980
1985
1990
1995
2000
2005
2010

3.A. Production and consumption of food from a time-space geography point of view

3.B. Food self-sufficiency

3.C. Analysis of food production and trade with GDP

3.C.a. Food production and GDP

Out of the crops and the livestock produced, we will focus on the top 20 crops as well as top 20 livestock produced globally to be able to have a model with a reasonable amount of feautures.

In [63]:
#Choose the top 20 crops produced in the world for the first model
crop_production_df= uni_df.filter(like='Crops Production tonnes') 
crop_production_df = crop_production_df.sum(axis=0).sort_values(ascending=False)
crop_production_list = crop_production_df.index.tolist()[:20]
crop_production_list.insert(0,'(GDP, million $)')
crop_production_list
#Take the top 20 crops and the columns of our uni_df which contain their production
maincrops_df = uni_df[crop_production_list]

#Livestock
livestock_production_df= uni_df.filter(like='Livestock production') 
livestock_production_df = livestock_production_df.sum(axis=0).sort_values(ascending=False)
livestock_production_list = livestock_production_df.index.tolist()[:20]
livestock_production_list.insert(0,'(GDP, million $)')
livestock_production_list
#Take the top 20 crops and the columns of our uni_df which contain their production
mainlivestock_df = uni_df[livestock_production_list]

top_production_list = crop_production_list +livestock_production_list[1:]

top_production_df = uni_df[top_production_list]
top_production_df.head(5)
Out[63]:
(GDP, million $) Cereals (Rice Milled Eqv) Crops Production tonnes Sugar cane Crops Production tonnes Roots and Tubers,Total Crops Production tonnes Vegetables&Melons, Total Crops Production tonnes Wheat Crops Production tonnes Rice, paddy Crops Production tonnes Maize Crops Production tonnes Fruit excl Melons,Total Crops Production tonnes Sugar beet Crops Production tonnes ... Asses Livestock production Head Camels Livestock production Head Animals live nes Livestock production Head Mules Livestock production Head Poultry Birds Livestock production 1000 Head Chickens Livestock production 1000 Head Rodents, other Livestock production 1000 Head Ducks Livestock production 1000 Head Rabbits and hares Livestock production 1000 Head Turkeys Livestock production 1000 Head
0 1748.886597 3390122.0 55000.0 144000.0 482000.0 2081000.0 366000.0 667000.0 794600.0 62000.0 ... 1300000.0 300000.0 532303.0 35000.0 6000.0 6000.0 18925.0 600.0 427.0 570.0
1 1831.108982 3202450.0 50000.0 167000.0 558000.0 1915000.0 350000.0 670000.0 634800.0 60000.0 ... 1300000.0 300000.0 532303.0 35000.0 6000.0 6000.0 18925.0 600.0 427.0 570.0
2 1595.555482 3816800.0 35000.0 158000.0 500000.0 2450000.0 400000.0 720000.0 781400.0 63300.0 ... 1300000.0 300000.0 532303.0 31000.0 6000.0 6000.0 18925.0 600.0 427.0 570.0
3 1733.333275 4130140.0 58000.0 165000.0 515000.0 2700000.0 420000.0 760000.0 822500.0 57000.0 ... 1250000.0 300000.0 532303.0 26500.0 6100.0 6100.0 18925.0 600.0 427.0 570.0
4 2155.555500 4211140.0 51600.0 182000.0 518000.0 2750000.0 420000.0 770000.0 847500.0 66600.0 ... 1250000.0 300000.0 532303.0 26500.0 6100.0 6100.0 18925.0 600.0 427.0 570.0

5 rows × 41 columns

In [64]:
#Checking correlations of main crops between each other and with GDP
mainprod_correlation_matrix = round (top_production_df.corr(method='pearson'),3)
mainprod_correlation_matrix['(GDP, million $)'].sort_values(ascending = False)
Out[64]:
(GDP, million $)                                         1.000
Maize Crops Production tonnes                            0.837
Tomatoes Crops Production tonnes                         0.833
Chickens Livestock production 1000 Head                  0.832
Poultry Birds Livestock production 1000 Head             0.828
Citrus Fruit,Total Crops Production tonnes               0.806
Cereals (Rice Milled Eqv) Crops Production tonnes        0.797
Fruit excl Melons,Total Crops Production tonnes          0.792
Wheat Crops Production tonnes                            0.777
Soybeans Crops Production tonnes                         0.754
Vegetables&Melons, Total Crops Production tonnes         0.742
Turkeys Livestock production 1000 Head                   0.741
Bananas Crops Production tonnes                          0.700
Pigs Livestock production Head                           0.697
Sugar cane Crops Production tonnes                       0.689
Rabbits and hares Livestock production 1000 Head         0.682
Beehives Livestock production No Livestock production    0.667
Roots and Tubers,Total Crops Production tonnes           0.658
Grapes Crops Production tonnes                           0.657
Ducks Livestock production 1000 Head                     0.650
Cattle Livestock production Head                         0.644
Potatoes Crops Production tonnes                         0.641
Cattle and Buffaloes Livestock production Head           0.637
Animals live nes Livestock production Head               0.627
Horses Livestock production Head                         0.622
Barley Crops Production tonnes                           0.618
Sheep and Goats Livestock production Head                0.611
Rice, paddy Crops Production tonnes                      0.599
Sugar beet Crops Production tonnes                       0.598
Sheep Livestock production Head                          0.592
Goats Livestock production Head                          0.584
Oil, palm fruit Crops Production tonnes                  0.577
Cassava Crops Production tonnes                          0.564
Sorghum Crops Production tonnes                          0.551
Mules Livestock production Head                          0.538
Asses Livestock production Head                          0.506
Buffaloes Livestock production Head                      0.469
Sweet potatoes Crops Production tonnes                   0.412
Camelids, other Livestock production Head                0.403
Camels Livestock production Head                         0.342
Rodents, other Livestock production 1000 Head            0.016
Name: (GDP, million $), dtype: float64

Because "Cereals (Rice Milled Eqv) Crops Production tonnes" has such a high correlation with a lot of other features, it is probably an aggregate of them (eg over 90% with wheat).

In [65]:
#Looking at some relationships
for item in list(top_production_df.columns)[1:]:
    top_production_df.plot(kind='scatter', x=item, y='(GDP, million $)', grid=True)

We can quite clearly see that the production of many of the most popular crops can well be related to GDP.

3.C.b. Trade and GDP

Let's see if the same can be said about the most exported/imported goods.

In [66]:
#Choose the top 20 most exported items by mass
top_exports_df= uni_df.filter(like='export quantities tonnes')
top_exports_df = top_exports_df.sum(axis=0).sort_values(ascending=False)
top_exports_list =top_exports_df.index.tolist()[:20]
top_exports_list

#Take the top 20 exported crops and the columns of our uni_df which contain their production
top_exports_list.insert(0,'(GDP, million $)')
top_exports_df = uni_df[top_exports_list]


#Choose the top 20 most imported items by mass
top_imports_df= uni_df.filter(like='import quantities tonnes')
top_imports_df = top_imports_df.sum(axis=0).sort_values(ascending=False)

top_imports_list =top_imports_df.index.tolist()[:20]
top_imports_list

#Take the top 20 imported items and the columns of our uni_df which contain their production
top_imports_list.insert(0,'(GDP, million $)')
top_imports_df = uni_df[top_imports_list]

#Create a joined list
top_traded_list = top_exports_list +top_imports_list[1:]
In [67]:
top_traded_df = uni_df[top_traded_list]
top_traded_df.head(5)
Out[67]:
(GDP, million $) Cereals Food export quantities tonnes Wheat+Flour,Wheat Equivalent Food export quantities tonnes Wheat Food export quantities tonnes Fodder & Feeding stuff Food export quantities tonnes Maize Food export quantities tonnes Milk Equivalent Food export quantities tonnes Sugar and Honey Food export quantities tonnes Oilseeds Food export quantities tonnes Oilseed Cake Meal Food export quantities tonnes ... Cake, soybeans Food import quantities tonnes Animal Vegetable Oil Food import quantities tonnes Sugar,Total (Raw Equiv.) Food import quantities tonnes Beverages Food import quantities tonnes Fixed Vegetable Oils Food import quantities tonnes Cassava Equivalent Food import quantities tonnes Sugar Raw Centrifugal Food import quantities tonnes Rice Food import quantities tonnes Rice - total (Rice milled equivalent) Food import quantities tonnes Barley Food import quantities tonnes
0 1748.886597 20000.0 2511.0 623.0 0.0 900.0 3617.5 0.0 7229.0 0.0 ... 19731.0 8286.0 57611.0 12.0 8286.0 190.0 3324.5 0.0 0.0 1650.0
1 1831.108982 20000.0 2511.0 623.0 0.0 900.0 3617.5 0.0 3456.0 0.0 ... 19731.0 11763.0 49413.0 20.0 11763.0 190.0 3324.5 3000.0 3000.0 1650.0
2 1595.555482 20000.0 2511.0 623.0 0.0 900.0 3617.5 0.0 34541.0 0.0 ... 19731.0 14471.0 46332.0 0.0 14471.0 190.0 3324.5 0.0 0.0 1650.0
3 1733.333275 20000.0 2511.0 623.0 0.0 900.0 3617.5 0.0 27678.0 0.0 ... 19731.0 10279.0 60959.0 12.0 10279.0 190.0 3324.5 0.0 0.0 1650.0
4 2155.555500 20000.0 2511.0 623.0 0.0 900.0 3617.5 0.0 18490.0 0.0 ... 19731.0 11017.0 54024.0 11.0 11017.0 190.0 3324.5 0.0 0.0 1650.0

5 rows × 41 columns

In [68]:
#Checking correlations of main traded goods between each other and with GDP
top_traded_correlation_matrix = round (top_traded_df.corr(method='pearson'),3)
top_traded_correlation_matrix['(GDP, million $)'].sort_values(ascending = False)
Out[68]:
(GDP, million $)                                                        1.000
Total Meat Food export quantities tonnes                                0.827
Fodder & Feeding stuff Food export quantities tonnes                    0.823
Animal Vegetable Oil Food import quantities tonnes                      0.806
Fixed Vegetable Oils Food import quantities tonnes                      0.797
Milk Equivalent Food import quantities tonnes                           0.779
Sugar and Honey Food import quantities tonnes                           0.768
Cereals Food export quantities tonnes                                   0.759
Animal Vegetable Oil Food export quantities tonnes                      0.758
Oilseed Cake Meal Food export quantities tonnes                         0.758
Beverages Food import quantities tonnes                                 0.755
Oilseeds Food export quantities tonnes                                  0.749
Sugar,Total (Raw Equiv.) Food import quantities tonnes                  0.741
Wheat+Flour,Wheat Equivalent Food export quantities tonnes              0.737
Fixed Vegetable Oils Food export quantities tonnes                      0.735
Oilseeds Food import quantities tonnes                                  0.734
Cereals Food import quantities tonnes                                   0.732
Wheat Food export quantities tonnes                                     0.731
Maize Food import quantities tonnes                                     0.720
Fodder & Feeding stuff Food import quantities tonnes                    0.717
Sugar and Honey Food export quantities tonnes                           0.714
Wheat+Flour,Wheat Equivalent Food import quantities tonnes              0.712
Wheat Food import quantities tonnes                                     0.712
Maize Food export quantities tonnes                                     0.706
Sugar Raw Centrifugal Food import quantities tonnes                     0.702
Cake, soybeans Food export quantities tonnes                            0.696
Soybeans Food import quantities tonnes                                  0.694
Beverages Food export quantities tonnes                                 0.693
Milk Equivalent Food export quantities tonnes                           0.692
Rice - total  (Rice milled equivalent) Food import quantities tonnes    0.689
Rice Food import quantities tonnes                                      0.688
Soybeans Food export quantities tonnes                                  0.686
Barley Food import quantities tonnes                                    0.683
Sugar,Total (Raw Equiv.) Food export quantities tonnes                  0.681
Oilseed Cake Meal Food import quantities tonnes                         0.678
Cake, soybeans Food import quantities tonnes                            0.675
Rice Food export quantities tonnes                                      0.672
Rice - total  (Rice milled equivalent) Food export quantities tonnes    0.672
Barley Food export quantities tonnes                                    0.641
Sugar Raw Centrifugal Food export quantities tonnes                     0.557
Cassava Equivalent Food import quantities tonnes                        0.530
Name: (GDP, million $), dtype: float64

It seems that the most high correlations can be found from exports of meat as well as feeding stuff.

In [69]:
#Looking at some relationships
for item in list(top_traded_df.columns)[1:]:
    top_traded_df.plot(kind='scatter', x=item, y='(GDP, million $)', grid=True)
3.C.c. Machine learning model
3.C.c.i. Model - based on production

First, we only create a model using the production data. Afterwards, we use the data on exports, imports.

Standardization, scaling

Clearly seeing that our data isn't normally distributed, this means we shouldn't use standardization. We'll use a min-max scaling to make different quantities comparable in the same model.

In [70]:
sns.distplot(maincrops_df["Sugar cane Crops Production tonnes"], rug=False, hist=False)
Out[70]:
<matplotlib.axes._subplots.AxesSubplot at 0x1bcc07b5c48>
In [71]:
sns.distplot(np.log(maincrops_df["Maize Crops Production tonnes"]), rug=False, hist=False)
Out[71]:
<matplotlib.axes._subplots.AxesSubplot at 0x1bccd59c548>

We can clearly see that the production is not normally distributed, thus we shouldn't use a normalization of our features. The huge peak is probably from all the Nan values we set to the median...

In [72]:
#With MINMAX and not standardization, because it isn't a normal distribution.
top_production_values = top_production_df.values
min_max_scaler = preprocessing.MinMaxScaler()
top_production_minmaxed_values = min_max_scaler.fit_transform(top_production_values)
top_production_minmaxed = pd.DataFrame(top_production_minmaxed_values, columns=top_production_df.columns)
top_production_minmaxed.head(5)
Out[72]:
(GDP, million $) Cereals (Rice Milled Eqv) Crops Production tonnes Sugar cane Crops Production tonnes Roots and Tubers,Total Crops Production tonnes Vegetables&Melons, Total Crops Production tonnes Wheat Crops Production tonnes Rice, paddy Crops Production tonnes Maize Crops Production tonnes Fruit excl Melons,Total Crops Production tonnes Sugar beet Crops Production tonnes ... Asses Livestock production Head Camels Livestock production Head Animals live nes Livestock production Head Mules Livestock production Head Poultry Birds Livestock production 1000 Head Chickens Livestock production 1000 Head Rodents, other Livestock production 1000 Head Ducks Livestock production 1000 Head Rabbits and hares Livestock production 1000 Head Turkeys Livestock production 1000 Head
0 0.000022 0.001318 0.000029 0.000170 0.000412 0.002855 0.000494 0.000643 0.001153 0.000197 ... 0.029312 0.010816 0.076211 0.002322 0.000257 0.000280 0.823083 0.000484 0.000555 0.00118
1 0.000023 0.001245 0.000026 0.000198 0.000477 0.002627 0.000472 0.000646 0.000921 0.000191 ... 0.029312 0.010816 0.076211 0.002322 0.000257 0.000280 0.823083 0.000484 0.000555 0.00118
2 0.000020 0.001484 0.000018 0.000187 0.000428 0.003361 0.000539 0.000694 0.001133 0.000201 ... 0.029312 0.010816 0.076211 0.002057 0.000257 0.000280 0.823083 0.000484 0.000555 0.00118
3 0.000022 0.001606 0.000030 0.000195 0.000440 0.003704 0.000566 0.000732 0.001193 0.000181 ... 0.028185 0.010816 0.076211 0.001758 0.000261 0.000285 0.823083 0.000484 0.000555 0.00118
4 0.000028 0.001638 0.000027 0.000215 0.000443 0.003772 0.000566 0.000742 0.001229 0.000212 ... 0.028185 0.010816 0.076211 0.001758 0.000261 0.000285 0.823083 0.000484 0.000555 0.00118

5 rows × 41 columns

In [73]:
train_set, test_set = train_test_split(top_production_minmaxed.values, test_size = 0.2, random_state = 1)

X_train_set = train_set[:,1:]
Y_train_set = train_set[:,0]

X_test_set = test_set[:,1:]
Y_test_set = test_set[:,0]
In [74]:
number_of_folds = 5
scores = []
list_of_alpha = [i for i in np.arange(0,400,1)]
for alpha in tqdm(list_of_alpha):
    clf = Ridge(alpha = alpha)
    score = cross_val_score(clf, X_train_set, Y_train_set, cv=number_of_folds, scoring = 'neg_mean_squared_error')
    scores.append([alpha, score.mean()])
a=np.array(scores)
print("The best value obtained is for alpha equal to " + str(a[np.where(a==np.amax(a[:,1]))[0]][0,0]) + " with a MSE of "+ str(-a[np.where(a==np.amax(a[:,1]))[0]][0,1]))
alphas = [elt[0] for elt in scores]
MSE = [-elt[1] for elt in scores]
sns.lineplot(alphas, MSE)
_ = plt.title("Cross validation score")
_ = plt.ylabel("Mean Squared Error")
_ = plt.xlabel("alphas")
100%|████████████████████████████████████████████████████████████████████████████████| 400/400 [00:27<00:00, 14.30it/s]
The best value obtained is for alpha equal to 0.0 with a MSE of 0.00018993972148451834
In [75]:
#this means I choose my alpha as 0??
# We build our model with the chosen alpha.
model_top_production = Ridge(alpha=0)
model_top_production.fit(X_train_set, Y_train_set)
Out[75]:
Ridge(alpha=0, copy_X=True, fit_intercept=True, max_iter=None, normalize=False,
      random_state=None, solver='auto', tol=0.001)
In [76]:
weights_top_production = pd.DataFrame([model_top_production.coef_], columns=top_production_df.columns[1:])
weights_top_production = weights_top_production.sort_values(by=0, axis=1, ascending=False)                                  
weights_top_production
Out[76]:
Chickens Livestock production 1000 Head Cereals (Rice Milled Eqv) Crops Production tonnes Fruit excl Melons,Total Crops Production tonnes Cattle Livestock production Head Roots and Tubers,Total Crops Production tonnes Ducks Livestock production 1000 Head Sugar beet Crops Production tonnes Vegetables&Melons, Total Crops Production tonnes Buffaloes Livestock production Head Sheep and Goats Livestock production Head ... Grapes Crops Production tonnes Barley Crops Production tonnes Citrus Fruit,Total Crops Production tonnes Bananas Crops Production tonnes Cassava Crops Production tonnes Sorghum Crops Production tonnes Rice, paddy Crops Production tonnes Potatoes Crops Production tonnes Cattle and Buffaloes Livestock production Head Poultry Birds Livestock production 1000 Head
0 6.968226 1.049415 0.778406 0.530833 0.380118 0.236095 0.221196 0.198248 0.150289 0.119379 ... -0.127099 -0.133032 -0.137365 -0.168986 -0.171376 -0.270677 -0.348605 -0.55206 -0.786411 -7.247204

1 rows × 40 columns

In [77]:
model_top_production.score(X_train_set, Y_train_set)
Out[77]:
0.8402735698901358

The result doesn't seem to make sense!! probably because chickens and poultry are so highly correlated.

3.C.c.ii. Model - based on trade
In [78]:
#With MINMAX and not standardization, because it isn't a normal distribution.
top_traded_values = top_traded_df.values
min_max_scaler = preprocessing.MinMaxScaler()
top_traded_minmaxed_values = min_max_scaler.fit_transform(top_traded_values)
top_traded_minmaxed = pd.DataFrame(top_traded_minmaxed_values, columns=top_traded_df.columns)
top_traded_minmaxed.head(5)
Out[78]:
(GDP, million $) Cereals Food export quantities tonnes Wheat+Flour,Wheat Equivalent Food export quantities tonnes Wheat Food export quantities tonnes Fodder & Feeding stuff Food export quantities tonnes Maize Food export quantities tonnes Milk Equivalent Food export quantities tonnes Sugar and Honey Food export quantities tonnes Oilseeds Food export quantities tonnes Oilseed Cake Meal Food export quantities tonnes ... Cake, soybeans Food import quantities tonnes Animal Vegetable Oil Food import quantities tonnes Sugar,Total (Raw Equiv.) Food import quantities tonnes Beverages Food import quantities tonnes Fixed Vegetable Oils Food import quantities tonnes Cassava Equivalent Food import quantities tonnes Sugar Raw Centrifugal Food import quantities tonnes Rice Food import quantities tonnes Rice - total (Rice milled equivalent) Food import quantities tonnes Barley Food import quantities tonnes
0 0.000022 0.000052 0.000014 0.000004 0.0 0.000007 0.000031 0.0 0.000050 0.0 ... 0.00031 0.000090 0.000954 1.764093e-07 0.000106 0.000005 0.000092 0.000000 0.000000 0.000055
1 0.000023 0.000052 0.000014 0.000004 0.0 0.000007 0.000031 0.0 0.000024 0.0 ... 0.00031 0.000127 0.000818 2.940155e-07 0.000150 0.000005 0.000092 0.000079 0.000079 0.000055
2 0.000020 0.000052 0.000014 0.000004 0.0 0.000007 0.000031 0.0 0.000241 0.0 ... 0.00031 0.000157 0.000767 0.000000e+00 0.000185 0.000005 0.000092 0.000000 0.000000 0.000055
3 0.000022 0.000052 0.000014 0.000004 0.0 0.000007 0.000031 0.0 0.000193 0.0 ... 0.00031 0.000111 0.001009 1.764093e-07 0.000131 0.000005 0.000092 0.000000 0.000000 0.000055
4 0.000028 0.000052 0.000014 0.000004 0.0 0.000007 0.000031 0.0 0.000129 0.0 ... 0.00031 0.000119 0.000894 1.617085e-07 0.000141 0.000005 0.000092 0.000000 0.000000 0.000055

5 rows × 41 columns

In [79]:
train_set, test_set = train_test_split(top_traded_minmaxed.values, test_size = 0.2, random_state = 1)

X_train_set = train_set[:,1:]
Y_train_set = train_set[:,0]

X_test_set = test_set[:,1:]
Y_test_set = test_set[:,0]
In [80]:
number_of_folds = 5
scores = []
list_of_alpha = [i for i in np.arange(0,400,1)]
for alpha in tqdm(list_of_alpha):
    clf = Ridge(alpha = alpha)
    score = cross_val_score(clf, X_train_set, Y_train_set, cv=number_of_folds, scoring = 'neg_mean_squared_error')
    scores.append([alpha, score.mean()])
a=np.array(scores)
print("The best value obtained is for alpha equal to " + str(a[np.where(a==np.amax(a[:,1]))[0]][0,0]) + " with a MSE of "+ str(-a[np.where(a==np.amax(a[:,1]))[0]][0,1]))
alphas = [elt[0] for elt in scores]
MSE = [-elt[1] for elt in scores]
sns.lineplot(alphas, MSE)
_ = plt.title("Cross validation score")
_ = plt.ylabel("Mean Squared Error")
_ = plt.xlabel("alphas")
100%|████████████████████████████████████████████████████████████████████████████████| 400/400 [00:29<00:00, 13.67it/s]
The best value obtained is for alpha equal to 0.0 with a MSE of 0.00026671001602442595
In [81]:
# We build our model with the chosen alpha.
model_trade = Ridge(alpha=0)
model_trade.fit(X_train_set, Y_train_set)
Out[81]:
Ridge(alpha=0, copy_X=True, fit_intercept=True, max_iter=None, normalize=False,
      random_state=None, solver='auto', tol=0.001)
In [82]:
weights_trade = pd.DataFrame([model_trade.coef_], columns=top_traded_minmaxed.columns[1:])
weights_trade = weights_trade.sort_values(by=0, axis=1, ascending=False)                                  
weights_trade
Out[82]:
Rice - total (Rice milled equivalent) Food export quantities tonnes Fodder & Feeding stuff Food import quantities tonnes Fodder & Feeding stuff Food export quantities tonnes Rice - total (Rice milled equivalent) Food import quantities tonnes Fixed Vegetable Oils Food import quantities tonnes Maize Food export quantities tonnes Fixed Vegetable Oils Food export quantities tonnes Soybeans Food import quantities tonnes Wheat Food export quantities tonnes Total Meat Food export quantities tonnes ... Wheat Food import quantities tonnes Animal Vegetable Oil Food import quantities tonnes Milk Equivalent Food export quantities tonnes Oilseeds Food import quantities tonnes Animal Vegetable Oil Food export quantities tonnes Oilseed Cake Meal Food export quantities tonnes Rice Food import quantities tonnes Oilseed Cake Meal Food import quantities tonnes Cereals Food export quantities tonnes Rice Food export quantities tonnes
0 3.04501 1.034067 0.776319 0.537593 0.433523 0.411777 0.399938 0.396629 0.382785 0.32971 ... -0.12249 -0.142197 -0.149117 -0.382451 -0.3866 -0.390715 -0.581538 -1.10162 -1.149509 -2.92374

1 rows × 40 columns

In [83]:
model_trade.score(X_train_set, Y_train_set)
Out[83]:
0.7708287882425322

This needs to be cleaned, bc obviously the one with highest and lowest weights are actually super related.

3.D. Food prices stability and agricultural features

4. Informed plan for next actions

Our results seem pretty intersting to share to the world. Moreover we have nice interactive maps and we would like to focus more on visual and style than writting on methodology. Therefore, we would like to produce a data story.